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The experiment described in the following 
pages was suggested by a discussion in a 
meeting of the lecturers in the General 
Humanities course at West Virginia Univer- 
sity. The examiner was attempting to make 
all the tests in the course of an objective type. 
A number of the lecturers insisted that the 
» essay question should be used because of their 
belief that it tests certain abilities that can 
not be measured by other types of questions. 
It was decided that some essay questions 
should be used, but in order to check their 
value it was thought best to ask three persons 
to read the answers to each question.’ 

The first question, given to the Thursday 
discussion sections of Humanities 1, was: 

Discuss the ideas of geography held by 
Dante: (1) as to the universe in general, and, 
(2) as to Hell in particular. (Time: 15 
minutes ) 












































The second question, given to the Friday 
discussion sections of Humanities 1, was: 


Discuss the plot of Dante’s Divine Comedy. 
(Time: 15 minutes) 

Three members of the faculty, all author- 
ities on Dante, were asked to read the an- 
swers to the questions and to assign to each 
a letter grade according to the University 
grading system, in which the highest grade is 
A and the lowest is F. 

When the papers were returned there 
seemed to be, at first glance, a high degree of 
agreement. The average grades given by 
Professors R, S, and T were D, D-+-, and D, 
respectively. The number of papers assigned 
failing grades (E or F) by Prof. R was 
twenty-four; by Prof. S and Prof. T, twenty- 
one and twenty-three, respectively. A closer 
study of the grades, however, revealed some 
interesting things. While each reader gave 
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under the names of Professors R, S, T, U, V, W, X, , 








AN EXPERIMENT IN THE ESSAY-TYPE QUESTION 
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approximately the same number of failing 
grades, they were not assigned to the same 
papers. They agreed in giving a failing 
grade to only eight papers, five of which were 
completely blank. Of the sixty-five students 
taking the test, forty-four were assigned a 
failing grade by one or more of the readers, 
but of these forty-four there were thirty-six 
who received a passing grade from one or 
more of the readers. The paper rated by 
Prof. R as the fourth best, as indicated by his 
grades, was rated thirtieth by both Prof. S 
and Prof. T. The paper rated the best by 
Prof. S was considered fifteenth and seven- 
teenth by Profs. R and T, respectively. There 
were three cases of grades of A and D being 
assigned to the same paper by different read- 
ers and two cases of B and F being given to 
the same paper. The correlation between the 
grades assigned by Prof. R and Prof. S to the 
same sixty-five papers, according to the rank- 
difference method, was .65 + .05; between 
the grades of Prof. R and Prof. T, .67 + .o5; 
and between the grades of Prof. S and Prof. 
T, .69 + .04. 

When the results of the grading became 
known two objections were raised: first, the 
two discussion groups had had different ques- 
tions and it was difficult to compare the 
papers of one group with those of the other; 
second, the questions were not carefully 
worded. To meet these objections and to 
make the experiment as fair as possible it was 
decided to give two essay questions on the 
final examination, when all the students would 
be together, and to appoint two committees of 
three faculty members each, to prepare the 
questions, one on some phase of history and 
one on some phase of English literature. 

The history committee submitted the fol- 
lowing question: 

Compare or contrast in approximately 400 
words life in a medieval castle and life in a 
medieval town in regard to the following 
points: 











to 






(a) security or the lack of it. 

(b) sanitation or the lack of it. 

(c) training of boys for knighthood or 
industry. 

(d) recreations or diversions. 


The English committee submitted the fol- 
lowing: 

Name seven types of medieval characters 
presented in the Prologue to The Canterbury 
Tales and discuss two of them. 

The examiner called the attention of the 
committee to the fact that the first part of the 
question called for pure factual information 
which could easily be tested in objective form 
and suggested, since the use of the essay ques- 
tion was based on the belief that it tests 
something that can not be tested by other 
methods, that the naming of the seven types 
of characters be included elsewhere in the 
examination and the essay question be re- 
stricted to discussing two types of medieval 
characters found in the Prologue. The com- 
mittee immediately asked for time to 
reconsider. 

When the report of the English committee 
was finally returned it was accompanied by a 
minority report. One member of the com- 
mittee could not agree with the others on 
what constitutes a good essay question. The 
majority report, which was accepted by the 
examiner, proposed the following: 

Answer any two of the following questions, 
in paragraphs or short compositions of 150— 
200 words each. Wherever possible, support 
the statements you make by reference to the 
parts of The Canterbury Tales which you 
have read. 

(1) What was Chaucer’s plan and purpose 

in The Canterbury Tales? 

(2) Show that Chaucer had a sense of 

humor. 

(3) Show that the Prologue affords a good 

picture of medieval society. 

(4) Are the characters in the Prologue and 

the Tales individuals or merely types? 


As before, the members of each committee 
were asked to read the papers independently 
and assign a letter grade to each. On the 
history question, the average grades given by 
Prof. U, Prof. V, and Prof. W were C—, D, 
and C—, respectively. Of the seventy-five 
students taking the examination, eleven re- 
ceived a failing grade from Prof. U, twenty- 
four from Prof. V, and nineteen from Prof. W. 
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Twenty-nine different students received a fail- 
ing grade from one or more of the readers. 
Of these twenty-nine, twenty-one received a 
passing grade from one or more of the read- 
ers. Only two papers received the grade of 
F from all three readers. One of these was a 
blank paper. There were two cases of the 
grades A and C— being assigned to the same 
paper by different readers, one case of B and 
E, and two cases of C and F. The correla- 
tion between the grades assigned by Prof. U 
and those assigned by Prof. V was .72 + .04; 
between the grades of Prof. U and Prof. W, 
-71 + .04; between the grades of Prof. V and 
Prof. W, .61 + .o5. 


On the Chaucer question, the average 
grades given by Prof. X, Prof. Y, and Prof. Z 
were C—, D, and C—, respectively. Seven- 
teen papers received a failing grade from 
Prof. X, twenty-nine from Prof. Y, and six- 
teen from Prof. Z. Of the seventy-five pa- 
pers, thirty-three received a failing grade 
from one or more of the readers. Of these, 
twenty-two received a passing grade from one 
or more of the readers. In regard to the grade 
of F, the English committee agreed much bet- 
ter than the history committee, unanimously 
assigning this grade to nine students. There 
were, however, five blank papers on the 
Chaucer question as compared with one on 
the history question. There was one case of 
the grades A and D being assigned to the 
same paper by different readers, one case of 
A and C, and three of B and E. The corre- 
lation between the grades assigned by Prof. X 
and Prof. Y was .77 = .03, between those of 
Prof. X and Prof. Z, .81 + .03, and between 
those of Prof. Y and Prof. Z, .84 + .o2. 


At this time it was becoming increasingly 
obvious that the members of the faculty do 
not have the same standard in grading essay 
tests, and the question was raised: Does each 
one, separately, have a consistent standard? 
In an attempt to find a partial answer to this 
question, the committees were asked to read 
again the papers of the final examination 
after a period of two weeks. The members 
of the history committee refused point-blank, 
saying that they were already convinced that 
they could not grade essay tests with any de- 
gree of accuracy. The members of the Eng- 
lish committee, however, agreed to the pro- 
posal. 

Even before the results of the second read- 
ing of the Chaucer question were reported, 
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reasons were advanced why there would be 
little improvement in agreement. Prof. X 
stated that he had been misled by certain 
remarks of the examiner and had not graded 
right the first time, but he believed he was 
doing it properly the second time. Prof. Y 
said that he was conscious of being in a better 
humor for the second reading and, moreover, 
he had more time and was reading with more 
care. Prof. Z read with much more care, 
going over all the papers in inverse order to 
check his own grading. He also pointed out 
certain pertinent facts: there was no agree- 
ment as to what to do when a student failed 
to follow instructions; some of the readers 
had marked on the papers and this hindered 
his forming an independent judgment; some 
of the papers were written with pen and 
others with pencil and this made it difficult 
to appraise their relative worth. 


On the second reading, only one member 
of the committee agrees with himself better 
than with his colleagues. This is Prof. Y. 
The agreement between the first and second 
readings in the cases of Prof. X and Prof. Z 
is about the same as that between the read- 
ings of the different members of the commit- 
tee. According to the grades assigned by 
Prof. X, the papers had deteriorated in the 
two weeks that they had lain on the exam- 
iner’s desk. His average grade drops from 
C— to D+. He finds that five students 
have passed on the first reading that now 
deserve to fail. There are some outstanding 
examples, however, of papers that have im- 
proved with age. The paper which he indi- 
cates by his grade as the best on the second 
reading rated sixteenth the first time. The 
correlation between the grades of his first 
reading and those of his second reading is 
84 + .02. 


Prof. Y, who is in a better humor, finds the 


papers generally improved. Five students 
pass who failed before, but he finds two F 
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papers that he missed the first time. The 
correlation between the grades of his first and 
second readings is .88 + .or. 

Prof. Z, like Prof. X, thinks the papers 
worse, finding ten more failures than in the 
first reading. Some papers, however, have 
improved. He assigns the grade of A— to 
a paper that he had given D before, thus 
raising it from the obscurity of the fifty- 
fourth paper to the distinction of the fifth 
best. 

Although all three readers used greater care 
in the second reading, the lack of agreement 
is greater. The correlation between the 
grades assigned by Prof. X and Prof. Y is 
.70 + .04 as compared with .77 + .03 the 
first time; between those of Prof. X and 
Prof. Z, .72 + .o4 as compared with .81 + 
.03 on the first reading; between those of 
Prof. Y and Prof. Z, .82 + .03 as compared 
with .84 + .02 on the first reading. 

One would scarcely venture to draw any 
very definite conclusions on the basis of this 
one experiment, but it seems pertinent to call 
attention to the following facts: 


(1) The essay questions used in this ex- 
periment were more carefully prepared than 
is usual in many course examinations. 

(2) The answers were read with more care 
than is usual in many course examinations. 

(3) In the cases covered by this experi- 
ment: 

(a) about six percent of the students fail. 

(b) about forty-four percent pass. 

(c) the passing or failing (not merely the 
difference of a letter grade but the 
difference between credit and no 
credit) of about forty percent depends, 
not on what they know or do not 
know, but on who reads the papers. 
the passing or failing of about ten 
percent depends, not on what they 
know or do not know, but on when 
the papers are read. 











THE VALUE OF AMOUNT OF TARDINESS AND ABSENCE 


FOR DIRECT AND DIFFERENTIAL PREDICTION 


Studies of the relation of amount or regu- 
larity of school attendance to academic 
achievement have revealed, in general, nega- 
tive results. Investigations summarized by 
Stephens lead to the conclusion: “The data 
uniformly indicate that any variations in the 
amount of attendance have had surprisingly 
slight effects.’”* None of these studies has 
been such as would conclusively demonstrate 
the absence of any relationship between the 
factors in question. First of all, there has 
been only slight variability in attendance 
within most of the groups studied. 

While it may not be possible to measure 
readily achievement differences between 
groups when their attendance differs by three 
or five per cent, it does not follow that 
marked differences in attendance will not be 
accompanied by measurable differences in 
achievement. In the second place, the studies 
that have dealt with this relationship have in 
some instances been limited to one or two 
semesters. When this factor is coupled with 
the one previously mentioned, the chance that 
any slight existing relationship would be 
observable is still further reduced. In the 
third place, no single study has dealt with 
data from a group of subjects homogeneous 
as to age, sex, race, and other factors which 
might in some manner have an influence upon 
the relation in question. 


Although various investigators have made 
studies of the effects of tardiness and absence 
on scholastic achievement, the problem still 
merits further analysis. It is also evident 
that some workers have not controlled intel- 
ligence, either by experimental or statistical 
procedures, when they have attempted to re- 
veal any possible effects that tardiness and 
absence may have on scholastic performance. 
Moreover, some workers have not analyzed 


* From a thesis submitted to the Graduate Faculty of the 
University “of Minnesota in partial fulfillment of the require- 
ments for the degree of Doctor of Philosophy. 

1J. M. Sughem, The Influence of the School on the Indi- 
siti Ann Arbor, Michigan: Edwards Brothers, Inc., 1933. 





OF ACADEMIC SUCCESS* 


CLaupbE L. NEMZEK 
University of Detroit 


These and 
other criticisms seem to be applicable to the 
previous studies which are available at the 


their data separately by sex. 


present time. Furthermore, no one has in- 
vestigated the effectiveness of tardiness and 
absence for purposes of differential prediction. 
Therefore, in this report various data will be 
discussed in order to show what value amount 
of tardiness and amount of absence have for 
direct and differential prediction of academic 
success as measured by teachers’ marks. 


Data were available for 147 boys and 129 
girls who had been graduated from Univer- 
sity High School, University of Minnesota. 
For all of the cases complete data were 
obtained for the following eight variables: 


(1) Intelligence quotients, 

(2) Average number of tardinesses per 
year, 

(3) Average number of absences per year, 

(4) Honor point averages in mathematics, 

(5) Honor point averages in science, 

(6) Honor point averages in English, 

(7) Honor point averages in history and 
social science, and 

(8) Honor point averages in languages. 


Only those cases were retained for this study 
who had at least two years of work in each 
of the five subject matter fields for which 
honor point averages were computed. 


The measure of intelligence used in this 
study was based upon the results of five group 
intelligence tests. The tests employed were 
Army Alpha 8; Pressey Senior Classification; 
Haggerty Delta 2; Terman Group Test, 
Form A; and Miller’s Mental Ability Test, 
Form A. Intelligence quotients were com- 
puted from the results of each test for each 
individual. The authors’ manuals for the 
respective tests were followed as closely as 
possible in administering and scoring, and, for 
all cases except that of the Pressey Test, in 
computing the intelligence quotients. In 
this instance, where the author’s norms 
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proved inadequate for children who made 
unusually high scores, the difficulty was re- 
solved by extrapolation. The intelligence 
quotients were in all instances converted into 
Stanford—Binet equivalents by means of the 
method proposed by Miller. Of the five 
intelligence quotients, the middle value was 
chosen as the measure to be used for each 
individual. 

Marks at University High School are given 
in the form of letter ratings. For the present 
study the letter ratings were converted into 
honor point averages. Each quarter hour 
mark of A was given three honor points; each 
quarter hour mark of B, two honor points; 
each quarter hour mark of C, one honor 
point; each quarter hour mark of D, no 
honor points; and each quarter hour mark of 
F, minus one honor point. Then the total 
number of honor points in each of the five 
subject matter fields was divided by the total 
number of quarter hours of marks involved 
in the respective subject matter fields in 
order to obtain the five honor point averages. 


In Tables I and II are presented for the 
boys and girls, respectively, the Pearson 
product-moment coefficients of correlation 
obtained by computing all of the intercorre- 
lations for the eight variables which were 
available. In Table III are included the 
means and standard deviations of the eight 
variables. 

An analysis of the data contained in 
Table I for the boys shows that the average 
number of tardinesses per year bears no sig- 
nificant relationship to any of the honor point 
averages in the five different subject matter 
fields. The coefficients are —.oo8 + .o56, 
—.094 + .055, .074 + .055, —.001 + .056, 
and —.016 + .056 between the average num- 
ber of tardinesses per year and honor point 
averages in mathematics, science, English, 
history and social science, and languages, 
respectively. 

Only two of these five coefficients are even 
larger than their own probable errors. Cer- 
tainly, no one of the coefficients is large 
enough to be considered anything but a 
chance relationship; therefore, one may con- 
clude that in the case of these boys, the 
average number of tardinesses per year has 
no value for predicting academic success as 
measured by teachers’ marks. 


*W. S. , 
, S. Miller, “The Variation and ificance of Intel 


ests,’ Journal of 


ts from 
Psychology, XV (1924), 359-66. 
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The data included in Table I for the boys 
also indicate that the average number of 
absences per year bears no significant rela- 
tionship to any of the honor point averages 
in the five different subject matter fields. 
The coefficients are —.108 + .055, —.153 + 
054, .022 + .056, —.107 + .055, and 
—.I10 + .055 between the averagé number 
of absences per year and honor point aver- 
ages in mathematics, science, English, history 
and social science, and languages, respec- 
tively. The magnitude of these five coeffi- 
cients indicates that not one of them has any 
significant or practical value for purposes of 
prediction. 


The results obtained for the girls are pre- 
sented in Table II and reveal some results 
which are statistically significant. It is 
shown that the average number of tardi- 
nesses per year correlates —.182 + .057, 
—.350 = .052, —.204 + .057, —.221 + 
.056, and —.262 + .o55 with honor point 
averages in mathematics, science, English, 
history and social science, and languages, re- 
spectively. When intelligence is held con- 
stant by means of partial correlation, these 
five coefficients become —.141 + .058, 
—.348 + .052, —.166 + .058, —.187 + 
.057, and —.239 + .056, respectively. In 
other words, these coefficients are 2.43, 6.60, 
2.86, 3.28, and 4.27 times their own probable 
errors, respectively. It is interesting to note 
that all of the coefficients are negative; fur- 
thermore, two of the coefficients are statis- 
tically significant. The data indicate that the 
average number of tardinesses per year is 
significantly related to honor point averages 
in science and languages. The results are 
even more interesting when they are com- 
pared with the data based upon the boys. 
It is quite apparent that a sex difference 
exists; in fact, the five coefficients obtained 
for the girls exceed those for the boys in 
every instance. 


Table II also includes the coefficients of 
correlation between the average number of 
absences per year and honor point averages 
in mathematics, science, English, history and 
social science, and languages for the girls. 
The coefficients are —.165 + .058, —.211 + 
057, —-137 + .058, —.160 + .058, and 
—.13I + .058, respectively. These values 
become —.136 + .058, —.193 + .057, 
—.101 + .059, —.130 + .058, and —.094 + 
.059, respectively, when intelligence is held 
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constant by means of partial correlation. 
Despite the fact that not one of these coeffi- 
cients is statistically significant, it is of some 
importance that all of them are negative. 
With one exception, the same is true of the 
boys; in fact, the data indicate that eighteen 
of the twenty coefficients showing the rela- 
tion of the average number of tardinesses per 
year and the average number of absences per 
year to honor point averages in the five sub- 
ject matter fields are negative. The consist- 
ency of these results indicates that even 
though the magnitude of the relationship is 
slight the direction of the relationship seems 
to be definitely established; that is, the re- 
sults strongly indicate that an increase in 
amount of tardiness or absence will generally 
be accompanied by a slight decrease in school 
marks. Furthermore, the degree of this in- 
verse relationship is slightly higher in the case 
of girls than in the case of boys. These 
findings corroborate previous data reported 
by Finch and Nemzek.* 


It seems logical to assume that tardiness 
and absence would have a greater detrimental 
effect on scholastic achievement than these 
results have indicated. Of course, the Uni- 
versity High School of the University of Min- 
nesota is not a typical secondary school. It 
differs from typical public secondary schools 
in many respects. The nature of the pupil 
population, as well as the nature of the fac- 
ulty, is different from that of the public sec- 
ondary schools. The pupils are highly se- 
lected from the standpoint of intelligence and 
they come from homes which rate highly 
from the standpoint of socio-economic status. 
No doubt, few of the students of the Univer- 
sity High School have to be absent or tardy 
due to outside work. On the other hand, it 
is entirely possible that some students in 
public high schools are occasionally absent or 
tardy due to outside work. One also seems 
justified in assuming that pupils in University 
High School come from homes where health 
conditions are far above average. These 
pupils may come from homes having more 
favorable dietary and nutritional conditions. 
In public high schools, the children come 
from homes more heterogeneous in health, 
and dietary and nutritional conditions; there- 
fore, pupils in public high schools may be 
absent and say more frequently due to ill 


Pe 2S H. +> L. Nemzek, “Attendance and 
tne High School,” School and Society, XLI 
(fenrean °, 933), 2 —8. 
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health. Is it not possible that the incidence 
of tardiness and absence is less in University 
High School than in typical public high 
schools? 

The slight relationship of amount of tardi- 
ness and absence to scholastic achievement in 
University High School may be due to the 
fact that the variability of amount of tardi- 
ness and absence is very small in University 
High School. A glance at Table III indi- 
cates that the average number of tardinesses 
per year is 2.66 (S. D. 3.38) and 2.44 (S. D 
3.00) for the boys and girls, respectively. 
Data from Pershing High School, Detroit, 
Michigan, collected by Curtis,* for two groups 
of 300 each, show that the average number of 
times tardy per semester is 1.06 (S. D. 1.53) 
and .82 (S. D. 1.32). These data show that 
in this public high school the average amount 
of tardiness, as well as the variability in 
tardiness, is less than at University High 
School. Therefore, if this public high school 
is typical of public high schools, one may say 
that this incidence of tardiness at University 
High School is certainly not less than it is in 
typical public high schools. Furthermore, 
one may say that the low relationship found 
between amount of tardiness and scholastic 
success is not due to a curtailed range in 
amount of tardiness peculiar to University 
High School. It may also be interesting to 
point out that Curtis found average times 
tardy per semester to correlate —.12 and 
—.o1 for her two groups of 300. 

Table III shows that the average number 
of absences per year is 4.53 (S. D. 5.01) and 
6.87 (S. D. 5.55) for the boys and girls, re- 
spectively. For her two groups, Curtis re- 
ports that the average number of absences 
per semester is 5.49 (S. D. 6.48) and 3.84 
(S. D. 4.70). These data indicate clearly a 
greater incidence of absence in this public 
high school; however, the correlations be- 
tween average number of absences per 
semester and school achievement are —.16 
and —.o5. These data suggest that the low 
degree of variability does not necessarily 
account for the low relationships of amount 
of tardiness and absence to scholastic achieve- 
ment at University High School. 

Despite the fact that certain variables may 
be of little value for purposes of direct pre- 
diction, they may be more valuable as prog- 

noztt ies Curtis, “The Relation of Certain Unsettled 

to the Academic 
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nostic of differential ability. This is pri- 
marily due to the fact that in finding a dif- 
ferential correlation coefficient a negative and 
a positive direct relationship coefficient may 
be brought together and the effect is an 
additive one. A survey of Tables I and I! 
reveals that of the 20 coefficients of correla- 
tion showing the relation of the average num- 
ber of tardinesses per year and the average 
number of absences per year to the five honor 
point averages, two are positive and eighteen 
are negative; consequently, there are only 
eight opportunities for negative and positive 
direct relationship coefficients to have an 
additive effect to produce higher differential 
coefficients of correlation. 

Despite the fact that the direct relation- 
ship coefficients are so low that they almost 
preclude any significant differential relation- 
ship coefficients, the value of the amount of 
tardiness and absence for purposes of differ- 
ential prediction was determined by Segel’s® 
method. In Tables IV and V are included 
the differential prediction coefficients based 
upon the data available for the 147 boys and 
the 129 girls under consideration. In Table 
VI are presented the means and standard 
deviations in step interval units as they were 
substituted in Segel’s formula for obtaining 
differential prediction coefficients.® 

*The formula used is: 

Tax oa — "bx ob 


"(a—d)x —————— 
Voa + ob? —2 'ab cach 





The effectiveness of the average number of 
tardinesses per year and the average number 
of absences per year for purposes of differ- 
ential prediction was determined. Tables IV 
and V present the differential prediction co- 
efficients of correlation for the boys and girls, 
respectively; Table VI contains the means 
and standard deviations in step interval 
units as they were substituted in Segel’s 


® David 1, Differential Diagnosis of Ability in School 
Children. Baltimore: ‘Warwick and York, Inc., 1934. Pp. 86. 

David Segel Prediction of Success im College. Office of 
Education Bulletin 1934, No. 15. Washington, D. C.: Gov- 
ernment Printing Office, 1934. Pp. 98. 

David Segel, ‘‘The Construction and Interpretation of Dif- 
ferential Ability Patterns,” Journal of Experimental Educa- 
tion, II (March, 1934), 283-87. 

“Differential Prediction of Ability as Repre- 
sented by lege Subject Groups,’ Journal of Educational 
Research, XXV (January, February, 1932), 14-26, 93-98. 

David Segel, “Differential Prediction of Scholastic Success,”’ 

hool and Society, XXXIX (January 20, 1934), 91-96. 

David Segel and J. R. rberich, “Differential College 
ta Esaainatee “Fowl sj Uppied. Pathos vi 

ina 0. 7 sychology, ; 
(December, 1933), 637-45. . : a 
> —_—) Lee and David Segel, ‘ The Utilization of Data 
from Simple or Direct Prediction in the Development of 
Equations for Differential Prediction,” Journal of 
ional Psychology, XXIV (October, 1933), 550-54. 


formula in order to obtain the differential 
prediction coefficients. 

A perusal of the data for the boys in 
lable LV suffices to show that the average 
number of tardinesses per year has no value 
for differential prediction. Only one of the 
ten differential prediction coefficients is even 
more than three times its probable error; that 
is '2(5 — 6), which is —.190 + .054. When 
intelligence is held constant by means of par- 
tial correlation, this coefficient becomes 
—.212 + .053, a value which is just four 
times its probable error. This coefficient 
certainly adds little to the effectiveness of 
prediction; furthermore, "1(5 — 6), which is 
.132 + .055, indicates that the intelligence 
quotient has no significant value for predict- 
ing the difference between honor point aver- 
ages for science and English. Therefore, the 
addition of the average number of tardinesses 
per year to the intelligence quotient in order 
to predict the difference between honor point 
averages for science and English would pro- 
duce a negligible increase in the effectiveness 
of prediction. 


The average number of absences per year 
has practically no value for differential pre- 
diction as demonstrated by the data included 
in Table IV. Not one of the ten coefficients 
is statistically significant, but three of them 
are more than three times their respective 
probable errors; namely, "3(5 — 6), —.208 
+ .053; "3(6—7), .208 + .053; and 
"3(6—8), .176 + .054. When intelligence 
is held constant by means of partial correla- 
tion, these coefficients become —.225 + .053, 
.208 + .053, and .180 + .054, respectively. 
These coefficients are 4.25, 3.92, and 3.33 
times their probable errors, respectively. It 
is apparent that they are of negligible value 
for prediction; furthermore, a glance at 
Table IV indicates that the intelligence quo- 
tient has practically no value for differential 
prediction; therefore, one may conclude that 
the data included in Table IV indicate that 
the effectiveness of differential prediction 
obtainable by combining the intelligence quo- 
tient, the average number of tardinesses per 
year, and the average number of absences per 
year has negligible practical value. 


The value of the average number of tardi- 
nesses per year and the average number of 
absences per year for differential prediction, 
in the case of the girls, is portrayed by the 
data in Table V. Two of the coefficients are 
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TABLE I statis 

Pearson Product-Moment Coefficients of Correlation With Their Probable Errors Based Upon: 253 

(1) Intelligence Quotients, Whe! 

(2) Average Number of Tardinesses Per Year, of p 

(3) Average Number of Absences Per Year, come 

(4) Honor Point Averages in Mathematics, t 
(5) Honor Point Averages in Science, pe 
(6) Honor Point Averages in English, 4.04 
(7) Honor Point Averages in History and Social Science, and One 

(8) Honor Point Averages in Languages for 147 Boys. coeffi 

Variables 2 8 4 5 6 7 8 predi 
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TABLE II 


Product-Moment Coefficients of Correlation With Their Probable Errors Based Upon: 


(1) Intelligence Quotients, 

(2) Average Number of Tardinesses Per Year, 
(3) Average Number of Absences Per Year, 
(4) Honor Point Averages in Mathematics, 




















































































(5) Honor Point Averages in Science, (1 
(6) Honor Point Averages in English, ; ; (2 
(7) Honor Point Averages in History and Social Science, and 

(8) Honor Point Averages in Languages for 129 Girls. (3 
Variables 2 3 4 5 6 7 8 (4 
AE ee —120 —.095 507 590 591 552 593 (5 
.059 .044 .039 .039 041 (6 
(7 

SE aT ee .127 —.182 —.350 —.204 —.221 
.058 .057 .052 .057 .056 (8 

ae ae eee Ae —.165 —.211 —.137 —.160 

.058 .057 .058 .058 
ethibtipipackeqddindlyaitioh 771 .758 .765 » Vari 
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statistically significant; namely, '2(4— 5), 
253 + .056; "2(5—6), —.233 + .056. 
When intelligence is held constant by means 
of partial correlation, these coefficients be- 
come .241 + .046 and —.226 + .056, re- 
spectively; consequently, they are 5.24 and 
4.04 times their probable errors, respectively. 
One must conclude, however, that these two 
coefficients have little practical value for 
prediction. Combining the average number 
of tardinesses with the intelligence quotient 
for purposes of differential prediction would 
certainly not result in effective prognosis be- 
cause "1(4—5) and '1(5—6) are only 
—.133 -& .058 and .o71 + .059, respectively. 


Considering the value of the average num- 
ber of absences per year for differential pre- 
diction, one finds in Table V that not one of 
the ten coefficients has any statistical sig- 
nificance. Therefore, the data based upon 
these 129 girls justify the conclusion that the 
combination of the intelligence quotient, the 
average number of tardinesses per year, and 
the average number of absences per year in 
order to predict any differential abilities that 
may be measured by the honor point averages 
for mathematics, science, English, history and 
social science, and languages has revealed 
nothing of practical value. 























TABLE III 





Means and Standard Deviations of: 





(1) Intelligence Quotients, 


(2) Average Number of Tardinesses Per 
Year, 


(3) Average Number of Absences Per Year, 
(4) Honor Point Averages in Mathematics, 
(5) Honor Point Averages in Science, 
(6) Honor Point Averages in English, 
(7) Honor Point Averages in History and 
Social Science, and 
(8) Honor Point Averages in Langua 
147 Boys and 129 Girls. — 














Boys (N = 147) Girls (N = 129) 












® Variables Mean S.D. Mean S§&S.D. 
ER BNR 117.15 11.75 117.00 12.35 
Baie 2.66 3.38 2.44 3.00 
ee ER, 4.53 5.01 6.87 5.55 
a 995 .655 1.105 .750 
©: sgieniietnd 1.185 .745 1.165 .760 
ON Sea 1.010 .615 1.410 .695 
eS aE 1.145 .630 1.315 .750 
ee 925 .785 1.285 .905 
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TABLE IV 


Differential Prediction Coefficients of Corre- 
lation Based Upon: 

(1) Intelligence Quotients, 

(2) Average Number of Tardinesses Per 
Year, 

(3) Average Number of Absences Per Year, 

(4) Honor Point Averages in Mathematics, 

(5) Honor Point Averages in Science, 

(6) Honor Point Averages in English, 

(7) Honor Point Averages in History and 
Social Science, and 


(8) Honor Point Averages in Languages for 


147 Boys*. 

Variables 1 2 3 
Fae —.183 125 .084 
Ee eae ae —.024 —.092 —.154 
UE. chiataesdhacmeeninicinn —.017 —.010 —.006 
BE cetctinernanrnseeeniiinn —.050 .010 .024 
EEE 132 —.190 —.208 
SU cxtaiacessibpiiamp einen 159 —129 —.087 
[a ees 093 —.085 —.041 
OS 7 Saree aroen .010 118 .208 
OP iccsaotnenenans —.033 102 176 
eo eee —.226 .022 .035 


*The probable errors of these differential 
prediction coefficients range from .053 to .056. 


TABLE V 


Differential Prediction Coefficients of Corre- 
lation Based Upon: 
(1) Intelligence Quotients, 
(2) Average Namber of Tardinesses Per 
Year, 

(3) Average Number of Absences Per Year, 
(4) Honor Point Averages in Mathematics, 
(5) Honor Point Averages in Science, 
(6) Honor Point Averages in English, 


(7) Honor Point Averages in History and 
Social Science, and 


(8) Honor Point Averages in Languages for 


129 Girls*. 

Variables 1 2 3 
es —.133 .253 072 
Se er ee —.066 .010 —.057 
a ISA. = —.047 .041 —.006 
7 a ee —.246 158 —.010 
NS ae 071 —.233 —.122 
Sf TIES aes .068 —.198 —.079 
* {5a ee —147 —.048 —.070 
ae —.008 .060 .062 
| ee —.239 181 .045 
ee as —.225 131 —.003 


*The probable errors of these differential 
prediction coefficients range from .056 to .059. 
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TABLE VI 


Means and Standard Deviations in Step 
Interval Units of: 


(1) Intelligence Quotients, 

(2) Average Number of Tardinesses Per 
Year, 
Average Number of Absences Per Year, 
Honor Point Averages in Mathematics, 
Honor Point Averages in Science, 
Honor Point Averages in English, 
Honor Point Averages in History and 
Social Science, and 
Honor Point Averages in Languages for 
147 Boys and 129 Girls. 


Boys (N = 147) Girls (N = 129) 
Mean S.D. Mean §.D. 
2.35 9.40 
1.69 1.22 
1.67 2.29 
1.31 5.21 
1.49 5.33 
1.23 5.82 
1.26 5.63 
1.57 5.57 


Variables 


SUMMARY 


The purpose of this study was to determine 
the value of amount of tardiness and amount 
of absence for direct and differential predic- 
tion of academic success as measured by 
teachers’ marks. The data, which were based © 


upon 147 boys and 129 girls, included the fol- 
lowing eight variables: intelligence quotients, 
average number of tardinesses per year, aver- 


[Vol. 7, No. 1 


age number of absences per year, honor point 
averages in mathematics, honor point aver- 
ages in science, honor point averages in Eng- 
lish, honor point averages in history and 
social science, and honor point averages in 
languages. These data indicate that in the 
case of the boys amount of tardiness and 
amount of absence have no value for predict- 
ing academic success as measured by teachers’ 
marks. For the girls, it was found that 
amount of tardiness correlated significantly 
with honor point averages for science and 
languages, even when intelligence was held 
constant by means of partial correlation. 
Amount of absence did not correlate signifi- 
cantly with any of the honor point averages 
when intelligence was held constant by means 
of partial correlation. The data for both 
sexes indicate that even though the magnitude 
of these relationships is slight the direction of 
the relationships seems to be definitely estab- 
lished; that is, the results strongly indicate 
that an increase in amount of tardiness or 
absence will generally be accompanied by a 
slight decrease in school marks. Further- 
more, the degree of this inverse relationship is 
slightly higher in the case of girls than in the 
case of boys. 

The data also indicate that amount of 
tardiness and amount of absence have neg- 
ligible value for purposes of differential 
prediction. 





A STUDY OF DOUBLE GRADES IN NEW HAVEN 
CITY SCHOOLS' 


ELTon E. KNIGHT 
Principal, Jessie I. Scranton Training School, 
New Haven State Teachers College, 
New Haven, Connecticut 


Tue PROBLEM AND Its PHASES 


The purpose of this investigation was the 
collecting of a body of data upon which the 
double grade could be evaluated and its 
proper place in educational procedure better 
determined. The problem was to discover 
whether children who were placed in a double 
or combination grade, a room containing two 
grades, could be expected to advance as rap- 
idly in their education as children in the tra- 
ditional elementary school where the proce- 
dure is to have a single grade in a room. The 
study attempted to discover the advantages 
and disadvantages, if there were any, in 
double grades. » 


This problem divided itself into two parts. 
In the first place, there were children who 
were combined with pupils of the next 
higher grade. Did not some children who 
were combined with a higher grade have an 
advantage by being placed where they were 
allowed to take more advanced work? Or 
was this advantage, if it existed, offset by 
some factors, such as, for example, the added 
number of class groups which the teacher 
may have had to organize in taking care of 
the wider range of individual differences?» In 
the second place, there were the children who 
were combined with those of a grade lower. 
Were these children not held back or re- 
tarded by having the teacher spend much of 
her time on work which they may have had 
the previous year, or were there other factors 
which compensated for this? In this investi- 
gation, both phases of the above mentioned 
problem were studied. 


Some of the other problems studied were: 


1. The methods used in organization and 
administration of double grades. 

2. The methods and procedures used by 
teachers in teaching double grades. 


1 Abstract of dissertation ted for the degree of 
Doctor of Philosophy at Yale University. i 


3. The amount of acceleration found in 
double grades, and retardation pre- 
vented by them. 

. The general attitudes of elementary 
school principals, teachers, and children 
toward double grades. 


THE GENERAL PROCEDURE 


I. In attempting to answer the scholastic 
problem, carefully controlled classes, in both 
straight and double fourth grades, were set 
up in the New Haven elementary schools. 

II. The other problems in this study were 
investigated in the following four ways: 


1. Through questionnaires to all elemen- 
tary school principals in New Haven. 

2. Through questionnaires to all elemen- 
tary teachers in combination grades in 
New Haven. 

. Through checking conditions found in 
the fourth grade study, both in single 
and combination grade rooms. 

. Through studying the office records in 
the board of education and in the ele- 
mentary schools. 


I. COMPARISON OF EDUCATIONAL 
ACHIEVEMENT 


The fourth grade was selected as the most 
convenient grade in which to conduct this 
comparative study in New Haven, for two 
reasons: 


1. All children in the fourth grade had 
been given the Stanford Achievement 
Test in May, 1934. 

2. During the year 1934-1935 there was 
a sufficient number of fourth grade 
rooms having double grades. 


Experimental and Control Groups 

All classrooms used in this study in which 
there were two grades, one being the fourth, 
will be called experimental classes. The con- 
trol groups, or classes, are the rooms in which 
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there were pupils of a single grade, the fourth. 
The third and fourth grade experimental 
classes, combined, will be called Experimental 
Group A, and its control group will be re- 
ferred to as Control Group A. The fourth 
and fifth grade experimental classes will be 
termed Experimental Group B, and their con- 
trol classes as Control Group B. The terms 
combination and double will be used synony- 
mously to refer to rooms having two grades. 


The Testing Program 


The following tests were used in the testing 
program: 


National Intelligence Test, Scale A, Form B. 

Stanford Achievement Test, Primary Forms 
X and Z. 

Stanford Achievement Test, Advanced Form 
Z. 

Seaton—Pressey Diagnostic Tests in English 
Composition, (capitalization, good usage, 
punctuation, and _ sentence structure), 
Forms A and B. 


In addition to these, a Social Studies Test 
was improvised. As there was no separate 
social studies test available that would satis- 
factorily cover the range of abilities of the 
children in this experiment, it was considered 
best to take the New Haven courses of study 
in history, geography, science, health, and 
hygiene, and make up a social studies test for 
the comparative purposes of this study.) One 
hundred statements with five possible an- 
swers, only one of which was correct, were 
made to cover the curriculum in the subjects 
mentioned above. | The statements then were 
paired first as to subject matter and second 
as to difficulty, in order to make two forms. 
Difficulty was judged by ,the grade in which 
each appeared in the course of study. The 
correlation between the two tests, which were 
given in October and May, was .56. 


The Pre-Test 


On May 15, 1934, the Primary Stanford 
Achievement Test, Form X, was given and 
scored by the supervising principals under the 
direction of the Department of Testing of 
the New Haven Schools. This test was given 
to all third grade pupils who were present on 
that date. 

The National Intelligence Test was given 
and scored by the director of this study be- 
tween the dates of September 8 and 20. 
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It was discovered, as the rooms were being 
matched, that there were from thirty to forty 
pupils in these rooms who had not taken the 
Stanford Achievement Test on May 15, due 
either to illness or to living at that time in 
another city. These pupils were tested by 
their respective principals on the roth and 
20th of September, 1934. 

To discover how to use the September re- 
sults with those obtained in May, 68 children 
in six rooms were selected at random and re- 
tested, using the Y form of the Stanford 
Achievement Test. The average educational 
ages of these children were: 


May 15, 1934 
E. A. 117.2 months 


September 20, 1934 
E. A. 116.9 months 


Since the difference of .3 of a month falls 
well within the probable error of the test, the 
September results were treated in the same 
way as those of May. 

On October 1, 1934, the Seaton—Pressey 
Diagnostic Test in English Composition, and 
the Social Studies Test, were given by seven- 
teen fourth-year students in the Educational 
Statistics and Measurement class of the New 
Haven Normal School. 


The Final Tests 


The final tests were given during the week 
of May 20, 1935. The Stanford Achieve- 
ment Test, Form Z, was given by the super- 
vising principals. . Care was taken to give the 
Advanced Form Z to all children who came 
within one and one-half years of the measur- 
ing limits in any subject of the preliminary 
Stanford Achievement Test of the preceding 
spring. Only two of the 329 children given 
the Primary Form, in the spring of 1935, had 
to be retested by the Advanced Form Y. 

The English and Social Studies Tests were 
given by the seventeen Normal School stu- 
dents on May 23. All children who missed 
tests on these dates were tested within the 
following two weeks. 


Selecting and Matching the Groups 


Several factors were taken into considera- 
tion in selecting the class groups to take part 
in this experiment, such as nationality and 
native ability of the children, ability of the 
teachers, and willingness of both principals 
and teachers to co-operate in the experiment. 
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This last factor was especially important, 
since it was desired to make this experiment 
under actual school conditions, trying at all 
times not to disturb the daily routine of the 
school any more than was absolutely neces- 
sary. 

A general survey was made of all the 
fourth grade rooms in New Haven. The 
children in fourth grade rooms having double 
grades were compared with those in straight 
fourth grades as to: (1) educational age, 
(2) chronological age, (3) nationality, and 
(4) general ability of the respective teachers. 
The last factor was determined by the judg- 
ment of the teacher’s principal and of one or 
more subject supervisors. Keeping these 
factors in mind, the fourth grade children in 
seven third-and-fourth double grade rooms 
were matched with the children in six 
straight fourth grades, as to: (1) average 
educational age, (2) average mental age, and 
(3) chronological age. Also the fourth grade 
children in seven fourth-and-fifth double 
grade rooms were matched with the children 
in six straight fourth grades in the same man- 
ner as in Group A. It was possible to use 
four rooms in both control groups, A and B 
(R.B., H.S., V.K., and B.E. designating the 
rooms by the initials of the teachers). 
Twenty-two different classrooms in thirteen 
schools in various parts of New Haven took 
part in this experiment. 

Tables I and II give the data for the chil- 
dren who remained in the rooms throughout 
the experiment. The data on average ages 
in the tables are as of May, 1934. 

From the low critical ratios it is evident 
that both groups, A and B, were evenly 
matched, with no appreciable difference in 
the average educational, mental, or chrono- 
logical ages. 

The children were probably average Amer- 
ican school children, since they represented 
homes from a rather wide range of social and 
economic background. The racial stock of 
the children in the various experimental and 
control groups was found to be similar. 


The Third and Fifth Grade Pupils in the 
Experimental Groups 
Tables I and II show the average pupil 
abilities in all of the fourth grade classes in 
the experiment. When comparing the abil- 
ity of the third grade pupils in Experimental 
Group A, it was found that those in double 
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grade rooms had an average C.A. of 110.7 
and E.A. of 120.4; while the average third 
grade pupil in all schools in this study had a 
C.A. of 111.7, and E.A. of 114.5. The aver- 
age E.A. for third grade pupils in all seven 
double grade rooms equaled or surpassed the 
average E.A. of all third grade children, when 
taking each school as a whole. These fig- 
ures would indicate that in the double third 
and fourth grades the pupils in the third 
grade were younger and brighter than the 
average third grade children in their respec- 
tive schools. In E.A. they were only seven 
months below the fourth grade pupils in the 
same rooms. 


When comparing the ability of the fifth 
grade pupils in Experimental Group B, it was 
found that those in double grade rooms had 
an average C.A. of 125 months and an E.A. 
of 121.5; while the average fifth grade pupil 
in all schools in this study had a C.A. of 122 
months and an E.A. of 123 months. The 
average E.A. for the fifth grade pupils in the 
double grade rooms was lower in five of the 
seven rooms; in one it was higher; in the 
other it was the same as that of the fifth 
grade, when taking each school as a whole. 
These figures would indicate that there was 
a tendency to place the older, slower fifth 
grade children in the double fourth and fifth 
grades. In comparing the average E.A. of 
the fifth grade pupils with those of the fourth 
in double rooms, it was found that there was 
only about six months difference in the 
average ages. 


The Range of Pupil Abilities in Double Grade 
Rooms 


A study was made as to how the range of 
pupil abilities in a combination room, when 
taken as a whole, including both grades, com- 
pared with that of the control or single grade 
class. To answer that question, the spread 
of achievement ages between the tenth and 
ninetieth percentiles, and the lowest and 
highest child, was -found for each classroom 
in the experiment. The average ranges for 
the various groups are given in Table III. 

In taking the differences between the tenth 
and ninetieth percentiles, or 80 per cent of 
the children, no great difference was found 
between the single and double grade rooms. 
As seen by the difference between the highest 
and lowest ages, there generally were found 
in the double grade rooms one or two pupils 
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TABLE III Supervision During the Year 
AVERAGE RANGE OF PUPIL ABILITIES IN TERMS The general plan of this experiment was to 
or MonTHS, For EACH EXPERIMENTAL have the principal and the general supervisors 
AND CONTROL GROUP 


continue with their regular routine of super- 
Average TenthPer- vision in these rooms. It was thought that 
Rangefrom centileto fairly equal supervision could be expected: 


Lowest to Ninetieth . 
ee Richest _ Fevecatile first, because many experimental classes and 


Experimental Group A 42 20 their controls were located in the same build- 
Control Group A ----- 34 19 ing; and second, the general policies of super- 
Experimental Group B33 19 vision were discussed and planned at the 
Control Group B ----- 28 18 monthly principals’ meetings held by the 
superintendent and the subject supervisors. 
whose abilities were either much superior or In order that all principals and teachers 
greatly inferior to those of the rest of the might become more familiar with this study, 
groups. a meeting was called for the first week in 


October. At this meeting the director of the 
study explained the purpose of the experi- 

Since the plan for rotation of teachers was ment and asked for cooperation in carrying 
inadvisable, it was decided to use the statist- on the daily and weekly work in the schools 
ical procedure of having a sufficient number and classrooms as if no experiment was tak- 
of rooms in each control and experimental jing place. The subject supervisors were 
group. By taking six rooms in each group, present, and they made suggestions for the 
keeping the teacher element in mind while improvement of teaching which they had 
the groups were being selected, it was believed planned for the coming year. Since many 
that the differences in teaching efficiency questions were asked by teachers of the ex- 
could be well cared for. To check this factor perimental groups, regarding the instruction 


Equalization of Teaching Talent 



























SEs further, the Jacobs Teacher Rating Scale’ of two grades in a single room, the group 
i * was used. An assistant superintendent, a decided on the following general principles to 
general supervisor, and the director of this be observed during the year for the double 
experiment, all visited the rooms, og | each grades. 
0 fy teacher, before the rooms were finally : , ; 
Ser 4 matched. The average of the three ratings * os all ap spo = curriculum to 
“was taken as each teacher’s rating. : ze stspaneibes sascg — : 
4 From the teacher’s rating, an average 2. eee = — the class as if 
>Sac0 ta weighted rating was obtained for each ex- ee eee 
an perimental and control group, by multiplying 3. There should be as many groups as 
4 the rating of each teacher by the number of necessary to care for individual needs 
pupils in her room used in the study. These of the pupils in each room, but groups 
n products were added together for each group should not be formed just because chil- 
3 and divided by the number of pupils in the dren are supposed to be in a certain 
= group. ) This statistical procedure equalized grade. 
; the teacher rating in terms of pupils in her 4. Pupils who could do the work of the ! 
> : room. | The ratings would indicate that the higher grade should be allowed to do 
3 ‘ group of teachers, when taken as a whole, so, and be given a double promotion at 
2 were above the average for the city, bearing the end of the year, if, in the judgment 
56 t out the opinions of the principals and subject of the principal, they were mentally, 
= e supervisors which were given during the selec- educationally, socially, emotionally, and 
a + tion and matching of the groups. The crit- chronologically fitted for it. 
” ical ratio of the difference in ratings between 
teachers in Experimental Group A and Con- The rooms in which the experiment was 
trol Group A was .75; in the B groups it conducted were visited by the director at 
was .25. least once each month, and sometimes more 





"CL, Jacobs, The Relation of the Teacher's Bducation frequently, between October and May, so 


New York’ _-— ot Publications, Teachers College, . that he might keep in touch at all times with 
bia University, 1928 the school work which was going on. 
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Results of the Experiment 


All Stanford Achievement Test data were 
reported in terms of ages, while in the Eng- 
lish and Social Studies Tests, the results were 
reported in scores, since there were no satis- 
factory age norms available. 

The statistical treatment used in portray- 
ing results of the measuring program is based 
on measures widely used and accepted. The 
mean improvement was determined for all 
groups. From the mean improvement, crit- 
_ ical ratios were determined to find chances of 

a true difference greater than zero between 
the experimental and control group. The 
short formula was used in determining the 
standard error of difference. 

According to Garrett,’ a true difference 
exists when the critical ratio is 3.0 or above. 
With a critical ratio of 1.3 there are 90 
chances in 100 that the true difference is 
greater than zero; while with a critical ratio 
of 2.2 there are 98.6 chances in 100. 


TABLE IV 


MEAN IMPROVEMENT IN MONTHS ON THE 
STANFORD ACHIEVEMENT TEST 


Group A Group B 
(3 and 4) (4 and 5) 
Experi- Experi- 
mental Control mental Control 
Paragraph 
Meaning... 14.32 12.37 15.36 12.88 
Word Mean- 
a 13.67 12.99 14.17 13.36 
Arithmetic 
Reasoning 13.47 12.27 11.41 13.29 
Arithmetic 
Computation11.78 10.44 11.45 9.25 
Spelling --.. 11.92 17.45 11.11 17.79 
Total Test 12.69 12.61 12.28 12.12 
TABLE V 


MEAN IMPROVEMENT IN TERMS OF SCORE ON 
THE ENGLISH AND SOCIAL STUDIES TESTS 


Group A Group B 
(3 and 4) (4 and 5) 
Experi- Experi- 
mental Control mental Control 
English, Punc- 
tuation _... 1.89 2.3 1.75 1.95 
English, Sen- 
tence Struc- 
1.16 1.24 .83 1.53 
English, Capi- 
talization _. 3.90 3.82 4.83 4.33 
English Usage 1.95 1.50 1.93 1.55 
Social Studies 6.44 5.70 5.84 5.24 
s E. Garrett, Statistics in Psychology and 
Pp. ie York: Longmans, ebay and Co. 1926. 
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TABLE VI 


THE CRITICAL RATIOS OF THE DIFFERENCES 
BETWEEN THE EXPERIMENTAL AND 
CONTROL GROUPS 


GroupA Group B 


(8and4) (4and 5) 

Stanford Achievement 

Paragraph Meaning __. 1.82 2.2 

Word Meaning ---_----- .64 7 

Arithmetic Reasoning. —.87 1.33 

Arithmetic Computation 1.4 1.7 

| RES ac ieeiraiett 5.36 6.07 

eae 127 .167 

English Composition 

Punctuation ........-- 1.3 65 

Sentence Structure __._._  .29 2.8 

Capitalization _.______-_ .218 1.46 

Se 1.04 .82 
Social Studies _..__.__._____ 1.17 90 


From Tables IV and VI there appears to 
be no appreciable difference between the 
growth of experimental and control groups 
in either A or B, when taking the Stanford 
Achievement Tests as a whole. In consider- 
ing the individual tests, Group A Experimen- 
tal surpassed the Control in the two reading, 
two arithmetic, English capitalization, and 
usage, and social studies tests. In Group B, 
the Experimental surpassed the Control in 
the same tests, with the exception of arith- 
metic reasoning. Both control groups showed 
superiority in spelling, punctuation, and 
sentence structure. 

Of all the differences, only that for spelling 
had a critical ratio above 3. Further study 
indicated that the reason for this difference 
was a greater amount of emphasis placed 
upon spelling by the teachers of the control 
rooms, especially in the free period at the 
beginning of each school session. 


II. GENERAL PROBLEMS CONCERNING THE 
ORGANIZATION, ADMINISTRATION , AND 
TEACHING OF DouBLE GRADES 


Through the splendid cooperation of the 
elementary principals, complete data were 
collected for all elementary schools in New 
Haven concerning the organization and 
administration of double grades. 

To the 62 teachers of double grades in the 
New Haven schools were sent questionnaires 
regarding their work and problems for the 
year 1934-1935. Of these 59 were returned. 
They covered all types of combinations found 
in grades one to six. 









e 









September, 1938] 


Results obtained from tabulation of the 
replies to the questionnaires, and from data 
collected from the New Haven schools and 
the board of education, lead the director of 
the study to make the following observations: 


1. Double grade rooms were found most 
often in schools having twelve rooms or less, 
and in-sections of the city where pupil popu- 
lation was scattered. 

2. Double grades were generally organized 
as an economy measure to take care of small 
groups of pupils in two adjacent grades. 


3. There was no one plan of assigning 
pupils to double grade rooms. Placing in- 
ferior pupils of the upper grade with superior 
of the lower was the more common procedure. 
There was some evidence to indicate that the 
average of the upper grade and the superior 
of the lower made the best working combina- 
tion. The method of grouping seemed to be 
an important factor determining the advan- 
tages and disadvantages, as well as the 
teacher’s attitude toward double grades. 


4. Double grades averaged from .8 to 2.5 
fewer pupils per room than single grades in 
the same grade and school. 

5. Teachers with one to five years of ex- 
perience, and those with more than ten years, 
were assigned less often to double grades than 
those who had from six to ten years of ex- 
perience. The number of years of experience 
did not seem to be the chief determining 
factor in the selection of teachers for these 
rooms. 


6. Seventy per cent of the teachers teach- 
ing double grades in 1934-1935 had taught 
double grade rooms before 1934. 

7. Conditions in each room determined the 
way in which teachers divided their time be- 
tween the two grades, as well as the course 
of study to be followed. 


8. Principals agreed that capable teachers 
should be allowed to determine their own 
time distributions and the courses of study 
to be followed. 


9. Teachers of double grades generally 
taught the grades separately in arithmetic 
and reading. 

10. Teachers of single grades divided their 
classes in the various elementary school sub- 
jects into as many or more instructional 


groups, when compared with teachers of 
double grades. 
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11. Twenty per cent of the teachers had 
difficulty in obtaining teaching material for 
double grades. 

12. Most principals believed that double 
grade rooms were not generally desirable 
from the standpoint of building administra- 
tion or pupil progress. 

13. Principals mentioned twelve advan- 
tages of double grades seventy-one times, 
and seventeen disadvantages forty-one times, 
while teachers mentioned thirteen advantages 
ninety-seven times, and twenty-one dis- 
advantages seventy-four times. 

14. Many of the disadvantages mentioned 
appear contrary to the findings of this study. 

15. Most teachers preferred teaching a 
single to a double grade room, feeling that 
the latter entailed more planning and 
preparation. 

16. Forty-five of the fifty-five teachers re- 
ported that they had worked harder in teach- 
ing double grades than in the last two or 
three single grades taught. 

17. Difficulties in pupil adjustment were 
mentioned most often by teachers having 
first-and-second and fifth-and-sixth combina- 
tion grades. 

18. The majority of teachers felt, when 
considering the child’s educational progress, 
that double grades were undesirable. This 
seemed to be contrary to evidence in the first 
part of this study. 

19. Seventy-six per cent of the teachers 
reported that double grades had no effect, or 
else a favorable one, on the child’s attitude 
toward his school work. 

20. Teachers felt that sixty, fourth grade 
children were prevented from being retarded 
by being in combination rooms in 1934— 
1935- 

21. With but approximately ten per cent 
of the children in double grades, it was found 
that 62.2 per cent of the pupil acceleration 
took place in this type of organization in 
1934-1935. 

22. Less than half of the principals and 
teachers mentioned double grades as a desir- 
able means of caring for pupil acceleration, 
while less than one quarter reported using 
them as a means of preventing retardation. 

23. The majority of the principals were 
opposed to double grades, but they took a 
less critical attitude toward them than did 
the teachers. Twenty-five per cent of the 


principals reported them as desirable in many 
circumstances. 


















24. When children were asked regarding 
the rooms and grades they liked and disliked, 
only one per cent reported a dislike for 
double grades. 

25. The teacher, and not the organization 
of the room, seemed to the child to be the 
important factor. It appears to make little 
difference to children whether they are in a 
single or a double grade room, as long as the 
teacher is sympathetic and impartial and 
plans the work to meet their needs, allowing 
them to do the things especially suited to 
their particular interests and capacities. 

























SoME CONCLUSIONS FROM THE FINDINGS 
IN THE STUDY 


The objective testing program led to the 
following conclusions regarding the scholastic 
progress of children in double grades: 


1. The complete Stanford Achievement 
Test (Primary Form) indicated that fourth 
i grade children in double grades, whether 
combined with the grade above or with the 
grade below, equaled or surpassed in growth 
children in rooms having only fourth grade 
pupils. 

2. Fourth grade children in double grades 
excelled in reading, both paragraph and word 
meaning; arithmetic computation; English 
capitalization and usage; and social studies. 

3. Children in straight fourth grades ex- 
celled in spelling, punctuation, and sentence 
structure. 

4. Spelling was found to be the only sub- 
ject where the critical ratio was above three, 
indicating that the differences here were too 
large to be attributed to chance. This large 
difference was probably due to single grades 
spending more time on spelling and studying 
more words per week. The double grades 
used the extra time, spent by the single grades 
on spelling, in working on arithmetic and 

reading. 
5. It did not seem to make any difference 
whether children were combined with the 
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grade above or the grade below, since results 
were practically the same for both groups. 

6. With the complete Stanford Achieve- 
ment Tests showing no difference between 
the progress of children of the fourth grade 
in single and double grade rooms; with two 
English tests favoring double grades and two 
the single; and with social studies favoring 
double grades by a small margin, one might 
conclude that scholastically there was little 
difference, if any, between fourth grade pupils 
in single and combination grade rooms. At 
least the children in the double grades did 
not suffer educationally from this type of 
organization. 

The following general findings were 
obtained from office records and question- 
naires: 


1. Double grades were generally organized 
aS an economy measure with no standard 
plan of assigning pupils to the room. 

2. Principals and teachers generally were 
not in favor of double grades, while the child 
did not seem to have any preference in the 
matter. 

3. Both principals and teachers listed, 
with much closer agreement, more advantages 
than disadvantages for double grades. 

4. Double grades proved to be an organ- 
ization especially suited to care for retarda- 
tion and acceleration. 

5. It was generally agreed by principals 
and teachers that conditions in each double 
grade room should determine the way in 
which teachers should divide their time be- 
tween two grades, as well as the course of 
study to be followed. 

6. Excepting arithmetic and reading, teach- 
ers generally taught their rooms is if they 
had one grade. 

7. Obtaining material was a problem for 
twenty per cent of the teachers. 

8. The teacher, and not the organization 
of the room, appeared to be the important 
factor to the child. 
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AN EXPERIMENTAL STUDY IN DEVELOPING HISTORY 
READING ABILITY WITH SIXTH GRADE PUPILS 
THROUGH THE DEVELOPMENT OF AN 
ACTIVE HISTORY VOCABULAR Y* 


WiLiiAm Ropcers PuHIpps 


Supervisor of Schools 
Talbot County, Maryland 


The Problem 


This study i is based on the assumption that 
reading is a complex mental and physical 
process conditioned by a number of specific 
factors.. In this investigation an attempt is 
made to isolate the language factor in read- 
ing and to study that factor in its relation to 
reading ability in Old World background his- 
tory content, Specifically, an attempt is 
made to determine the relation of growth in 
ability to use the language of history in writ- 
ten expression and the ability to read history 
material in which the expressions and pat- 
terns of language are found. A number of 
studies in the field of reading point to the 
belief that language is a potent factor in read- 
ing ability and indicate that such a study is 
needed at this time. Among the most com- 
prehensive studies which stress or imply the 
importance of the language factor in reading 
are the studies of Ayer, Carr, McKee, Wiede- 
feld, and Young. These are listed in the 
bibliography of this abstract. 


The Experiment 


For the purpose of this experiment 186 
pupils in six sixth grades in the Talbot 
County, Maryland, schools were selected. 
The experimental procedure covered a period 
extending from November, 1936 to May, 
1937. From February to May, 1937, two 
groups of pupils within the original experi- 
ment were studied after they had been 
equated on factors different from those factors 
used as the basis of equating the original 
groups. 


The Experimental Variable 

The essential variable factor in this ex- 
periment was the emphasis placed on the de- 
velopment of the language of history to the 


* Abstract of Doctor of Education dissertation, Johns Hop- 
kins University, June, 1938. 


end that it could be used accurately, clearly, 
and meaningfully in written form by the 
pupils. This emphasis was characteristic of 
the teaching method employed by the teach- 
ers of the experimental classes as one means 
of providing a condition of readiness for the 
reading of history content. The difference 
between the teaching procedures of the two 
groups was the use of teaching techniques 
designed to cultivate an active specific vocab- 
ulary in the experimental group while no such 
pointed efforts were employed by the teachers 
of the control group classes. 


Control of Non-Experimental Variables 


The pupils included in the experiment were 
typical of the pupils in the school system as 
a whole. They came from small country 
towns, from farms, and from watering com- 
munities. No locality or school represented 
in the experiment was superior or inferior to 
any other as judged by ordinary standards. 
The schools represented in the study were 
graded schools of four or five teachers. The 
teachers of the classes were all graduates of 
Maryland State normal schools. Their years 
of teaching experience ranged from three to 
six years. All teachers had had the same 
type of supervision for the entire period of 
their teaching experience. |The teachers of 
the control classes were assumed to be equal 
in teaching ability to the teachers of the ex- 
perimental classes. The same course of study, 
the same textual materials, and other aids to 
teaching were available to all teachers. Each 
class in the experiment was taught history 
for the same length of time each day during 
the period of the experiment. All classes 


received the same amount and type of super- 
vision during the experiment. 
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Tests Used in Equating Groups and in 
Measuring Learning 


Three factors were used in equating the 
groups: (1) the mental age of pupils as 
measured by the Otis Self-Administering 
Test of Mental Ability, Intermediate, Form 
A, (2) history-reading ability as measured by 
the Anne Arundel History Reading Test. 
This test was constructed by a committee of 
which the investigator was a member. The 
coefficient of reliability based on scores of 
1300 pupils, calculated by correlating odd 
and even scores and correcting by the 
Spearman—Brown formula is .826 + .o1. 
(3) A composition-vocabulary test designed 
by the experimenter. This test is composed 
of sets of statements relative to the history 
material covered by the sixth grade and pic- 
tures accompanied by statements. The state- 
ments were presented orally so that reading 
was not required on the part of the pupils. 
Each set of statements was followed by a 
question, given orally and in writing on the 
board at the same time. The pupils wrote 
the answers to the questions. The aim was 
to secure relatively free expression in history 
content. The written responses of the pupils 
were analyzed for history vocabulary by 
checking the words used by the pupils against 
the vocabulary of four widely used sixth 
grade texts in Old World background history. 

Because of lack of objectivity in this test 
of the use of history vocabulary in composi- 
tion, and in order to secure a more refined 
measure of the pupils’ ability to use history 


TABLE I 
GAINS IN READING HISTORY 
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vocabulary in written expression, another 
test was devised by the writer. The pupils 
were given incomplete sentences containing 
history key-words with the direction that 
they were to complete the sentences so that 
the second part of the sentence would show 
conclusively that they understood the mean- 
ing of the key-word in the first part of the 
sentence. 

The completion-sentence test was used in 
February, 1937, together with scores in his- 
tory reading and mental age for equating sep- 
arate control and experimental groups within 
the original experiment, In addition to the 
tests used for equating, the following tests 
were used as measures of pupil-learning: 
(1) the Renfrow History Achievement Test, 
Test I, Form A; Test II, Form A; (2) Unit 
Scales of Attainment, Reading, Division 2, 
Form A. Tests were administered in Novem- 
ber, February, and May of the school year 
1936-37. Identical tests were administered 
each time. 


RESULTS 


The data in Table I show that the gain 
made by the experimental group in history 
reading was greater than the gain made by 
the control group. The gain for the entire 
period of the experiment is statistically sig- 
nificant, since the critical ratio is 7. 

Table II shows that the gains made by the 
experimental group in composition vocabulary 
usage were significantly greater than the 
growth made by the control group. 


No. of Gain Nov. to Feb. Gain Nov. to May 
Group Cases Mean P.E. S.D. P.E. Mean P.E. S.D. P.E. 
ae 88 846 38 5.4 27 12.10 42 658 29 
ge EE 91 651 39 56.6 .27 8.15 38 54 .26 


Difference in Favor of Experimental 


No. of Gain Nov. to Feb. Gain Nov. to Ma 
Group Cases Mean P.E. S.D. P.E. Mean P.E. S.D. P.E. 
Experimental -..-.......-.-- 88 10.0 37 56.2 # .26 1833 42 658 °#&.29 
SE Gtisnuti:eeennd chases 91 73 2 at 98 383 47 £423 


Difference in Favor of Experimental 
DRED nacunipesumenbubineiadoadiabinnes 2.4 


TABLE II 
GAINS IN HIsToRY VOCABULARY USAGE IN COMPOSITION 


55 3.95 .56 
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The correlation between growth in history 
reading and growth in ability to use history 
vocabulary in composition for the experimen- 
tal group for the entire period of the experi- 
ment is .823 + .o2. With the influence of 
mental age partialled out the correlation is 
823 + .02; with history achievement par- 
tialled out the correlation is .803 + .o2; and 
with both of the latter factors held constant, 
the correlation is .823 + .o2. For the con- 
trol group the correlation between history 
reading growth and growth in ability to use 
history vocabulary in composition for the 
entire period of the experiment is —.136 + 
.o5. When the influence of mental age is 
held constant the coefficient of correlation be- 
tween growths in these two measures is found 
to be —.146 + .06; with history achieve- 
ment partialled out the correlation is .0o7 + 
.o7, and with both mental age and history 
achievement held constant the correlation is 
014 + .07. 


In both the experimental and control 
groups at the beginning of the experiment, 
with the influence of intelligence held con- 
stant, the correlations between ability to read 
history and ability to use history vocabulary 
in composition were almost identical: .287 + 
.06 for the experimental group, and .283 + 
.06 for the control group. Considering these 
figures together with the coefficients of cor- 
relation expressing relationship between 
growth in history reading and growth in the 
use of history vocabulary in composition, and 
considering at the same time that the gains 
made by the experimental group were statis- 
tically greater than the growth made by the 
control group, it would seem safe to say that 
the growth shown in ability to read history 
on the part of the experimental group is 
closely related to a corresponding growth in 
ability to employ a meaningful history vocab- 
ulary in composition. 


TABLE III 


GAINS IN History READING: GROUPS 
EQUATED IN FEBRUARY 


No. 
of Gain: February to May 


Group Cases Mean P.E. S.D. P.E. 
Experimental_. 78 3.91 .42 5.5 .29 
Control] _____.- 85 144 31 43 .22 


Difference in Favor of 7 
Experimental Group 2.47 .52 
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Table III presents data which indicate that 
when the groups within the originally equated 
groups were equated on mental age, Febru- 
ary scores in history reading, and history 
vocabulary used in completion sentences, the 
gain made by the experimental group from 
February to May in history reading was sig- 
nificantly greater than the gain made by the 
control group. 


TABLE IV 


GAINS IN HISTORY VOCABULARY USAGE IN 
COMPLETION SENTENCES 


GROUPS EQUATED IN FEBRUARY 


No. 


of Gain: February to May 


Group Cases Mean P.E. S.D. P.E. 
Experimental__ 78 7.44 48 6.37 .34 
Comtial, .....<« 85 1.62 .17 244 .12 


Difference in Favor of 
Experimental Group 5.82 .51 


Table IV indicates that for the same period 
of time the growth in the use of history 
vocabulary in completion sentences was in 
favor of the experimental group. 

The correlation between growth in history 
reading and growth in ability to use history 
vocabulary in completion sentences for the 
experimental group is .718 + .03. With the 
influence of intelligence partialled out the cor- 
relation is .gor + .o1. Since partial correla- 
tions presented in connection with the data in 
Tables I and II showed that the influence of 
history achievement on the relationship be- 
tween growth in history reading and growth 
in ability to use history vocabulary was neg- 
ligible, and since the pupils in the groups 
equated in February, with the exception of 
16, were the same as those equated in No- 
vember, it was considered unnecessary to in- 
clude history achievement in presenting the 
results pertaining to the groups equated in 
February. 

For the period from February to May for 
the control group the correlation between 
history reading growth and growth in ability 
to use history vocabulary in completion sen- 
tences is —.143 + .07. When the influence 
of intelligence is partialled out the coefficient 
of correlation between these two measures is 
—.07 + .07. These two correlation coeffi- 
cients are not statistically significant. 

At the beginning of the second phase of 
the experiment the coefficient of correlation 
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between history reading and ability to use 
history vocabulary in completion sentences 
was .519 + .02 for the experimental group 
and .550 + .04 for the control group. With 
the influence of intelligence held constant the 
correlation coefficient for the experimental 
group was .263 + .o6 and for the control 
group it was .218 + .06. 


In the case of both groups the coefficients 
of correlation between the original scores in 
history reading and use of history vocabulary 
in completion sentences were lowered when 
intelligence was partialled out. When the 
correlations between growths in the two abil- 
ities measured are considered it is found, how- 
ever, that to partial out intelligence increases 
the positive correlation in the experimental 
group while it increases the negative correla- 
tion in the control group. Considering these 
figures together with the fact that pupils in 
the experimental group made significant gains 
over the control group in history reading and 
in ability to use history vocabulary in com- 
pletion sentences, it would seem safe to say 
that the growth in ability to read history on 
the part of the experimental group is closely 
related to improvement in the ability to use 
history vocabulary meaningfully. 


CONCLUSIONS 


Under conditions which obtained in this 
experimental study the findings seem to in- 
dicate that: 


1. The majority of the pupils in the ex- 
periment learned to read history regard- 
less of the method employed by the 
teachers. 


2. The ability to read history can be im- 
proved by giving attention to the devel- 
opment of a meaningful vocabulary in 
history. 

3. Apparently one of the factors which de- 
termines the specific nature of reading 
material is the vocabulary in which the 
material is written. 


4. In order to develop an adequate back- 
ground for the reading of sixth grade 
history material the method employed 
should take into account the process of 
learning to read. 

5. There is a definite relation among aural 
comprehension, verbal expression, and 

reading. When the ability to compre- 


(Vol. 7, No. 1 





hend aurally and the ability to express 
verbally are well developed the ability 
to read is improved. 


Since the results of educational research are 
valuable to the extent to which they affect 
practice, the following implications pertinent 
to the investigation are given: 


1. By giving training in the meaningful use 
of the vocabulary of history, the read- 
ing of history material is facilitated. 


. The building of history concepts to- 
gether with supplying the proper vocab- 
ulary for these concepts seems to pro- 
vide a readiness for reading history 
material. 


te 


3. Unless adequate experience relative to 
the ideas symbolized by the spoken and 
printed word is provided, pupils will 
receive training in the muse of words 
without due attention to their meanings, 
and will often be able to make satisfac- 
tory oral and written responses without 
having a true understanding regarding 
the ideas with which and to which they 
are reacting. 


This investigation does not claim to have 
solved the problem of the specificity of read- 
ing, nor does it hold that the question of the 
relation of language to reading has been an- 
swered. It is merely one small attempt at 
indicating the need for similar investigations 
to the end that those who work with children 
may become increasingly more intelligent in 
their teaching of reading. 
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INDIVIDUAL DIFFERENCES IN RETENTION OF GENERAL 


SCIENCE SUBJECT MATTER IN THE CASE OF THREE 
MEASURABLE TEACHING OBJECTIVES 


AuBREY H. Worp and Rosert A. Davis 


The purpose of the present study, which is 
a part of a more extended investigation of 
learning in seventh grade general science, was 
to determine the effect of summer vacation on 
individual differences in performance in the 
case of three objectives. 


The investigation, which was conducted 
with three sections of seventh grade general 
science pupils in the public schools of 
Boulder, Colorado, involved: (1) the formu- 
lation of three measurable objectives for the 
course: ability to recall factual information, 
ability to explain scientific phenomena, and 
ability to draw conclusions from given data; 
(2) construction of three specially designed 
objective tests to measure the extent of 
attainment in these objectives; and (3) ad- 
ministration of these tests at regular two- 
week intervals during a semester of eighteen 
weeks, one form of such tests (Form A) being 
used to measure additional acquisition and, 
except for the initial testing period, the other 
form (Form B) to measure the amount re- 
tained of that already measured two weeks 
previously. Since acquisition and retention 
over two-week intervals were being simulta- 
neously studied throughout the course, it 
seemed logical to extend the investigation so 
as to include the effect of summer vacation 
on retention in these same objectives. 


In order, therefore, to determine the more 
sustained effects of training, the final exam- 
ination, which consisted of all Form B tests 
administered during the semester (a total of 
396 items), was administered in three sittings 
on June 1, 2, and 3 (1937) and again to the 
same group on September 13, 14, and 15 
(1937). On the assumption that the scores 
made on the different parts of the tests in 
June represented the degree of attainment of 
a particular pupil in the three abilities 
measured, his score on the respective parts of 
the tests in September made it possible to 
determine the extent to which these abilities 
had been retained during summer vacation 


University of Colorado 


when no formal instruction in the subject was 
given. Similarly, when the different parts of 
the final examination were combined and 
treated compositely, a fairly accurate picture 
of the amount retained of the varied aspects 
of the course is made possible. In stating 
the results, the three objectives are first 
treated separately and then compositely. 


1. Results of Tests Designed to Measure 
Recall of Factual Information 


Since Part I of the final examination con- 
sisted of 270 simple completion items,’ the 
results of the test in June accurately repre- 
sented the mastery of important facts pre- 
sented during the semester. Likewise, the 
scores on this part of the test in September 
indicated the extent to which this mastery 
had been retained during three months of 
summer vacation. 

The distribution of scores obtained on the 
270 recall items included in Part I for the 
final examination in June and its repetition 
in September is given for pupil groups and 
for such groups treated compositely in Table 
I. Marked variations in performance on 
both testings may be noted for all groups, a 
fact indicating varying degrees of mastery of 
factual information both at the close of the 
instructional period in June and after approx- 
imately three months of vacation. 

When mean scores on the June test are 
compared with those on the September test, 
losses are noted for all groups, the lowest gen- 
eral ability group showing the greatest abso- 
lute loss and the highest group the least. 
Relative losses range from 19.16 per cent for 
the lowest group to 15.60 per cent for the 
highest. An absolute mean loss of 19.05 is 
shown for the sections combined, a loss rep- 
resenting 17.40 per cent of the factual infor- 
mation at the command of pupils in June. 
Since the pupils had not been subjected to 
any kind of formal instruction during the 


1 The reliability of this part of the test in __ as deter- 


mined by the split-halves method, was .962 + 
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TABLE I 


SCORES ON FINAL EXAMINATION (JUNE AND SEPTEMBER), PART I, ALL SECTIONS 
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* In Tables I, II, III, IV, J and S denote June and September respectively. 
+1Q’s based on Henmon-—Nelson Test of Mental Ability Grades 7-12. 


vacation period and were unaware of the con- 
templated September testing, the amount of 
loss indicates a relatively high degree of 
retention. 

The standard deviations from the means 
of scores made in June and September, as 
shown for Sections C, B, and A, for groups 
sectioned (not actually in the classroom but 
for purposes of analyzing the data) on the 
basis of intelligence quotients alone, and for 
the total number of pupils, are less in Sep- 
tember than in June for all groups except 


Section C and the middle group based on 
intelligence quotients. When the coefficients 
of variation, however, are obtained for the 
two sets of values, it is found that the lower 
standard deviations from mean scores made 
in September actually represent an increase 
in individual differences. 

Relatively, individual differences in per- 
formance in September are much greater for 
Section C than for either Section B or A, 
and those for the lowest group as sectioned 
upon intelligence quotients are greater than 
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for either the middle group or for the highest 
group. Furthermore, the increase in indi- 
vidual differences in September over those in 
June is much more pronounced for Section 
C than for either Section B or A, and for the 
lowest intelligence group than for either the 
middle or the highest groups. It appears, 
therefore, that individual differences tend 
markedly to increase during summer vacation, 
the degree of accentuation in such differences 
varying inversely with the level of intelli- 
gence or general ability. 


2. Results of Tests Designed to Measure 
Ability to Explain Scientific Phenomena 


The distribution of scores made by the 
various groups on Part II (consisting of a 
total of 84 items”), which was designed to 


2 The reliability of this part of the test, as determined by 
the split-halves method, was .924 + .010. Each item con- 
sisted of a statement followed by five plausible explanations, 
only one of which was correct. 


TABLE II 
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measure ability to explain scientific phe- 
nomena, is shown for both testings in Table 
II. Comparison of mean scores for the June 
and September testings reveals that the great- 
est actual loss was experienced by Section B 
with a loss in mean score of 5.79, whereas 
the mean performance for Section A is ident- 
ical for the two testings. Expressed in per- 
centages, the greatest relative loss is shown 
by Section C with 15.57 per cent, whereas no 
absolute or relative loss is made by Section A. 


For groups sectioned on intelligence quo- 
tients, losses both absolute and relative are 
greatest for the lowest group and least for 
the highest. When the sections are treated 
compositely, a mean loss of 3.81 occurs, rep- 
resenting a relative loss of 9.09 per cent. It 
is evident that comparatively little loss occurs 
during the vacation in ability to explain 
scientific phenomena. 


SCORES ON FINAL EXAMINATION (JUNE AND SEPTEMBER), PART II, ALL SECTIONS 


Lowest Middle Highest Sections 
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Analysis of standard deviations from mean 
scores on Part II indicates marked individual 
differences for each group. Relatively, the 
degree of individual differences on the June 
examination is greatest for Section C and 
least for Section B. In September relative 
differences are greatest for Section C and 
least for Section A. By comparing the re- 
sults of the two tests it will be noted that 
individual differences increase for Sections C 
and B and decrease slightly for Section A 
during the summer, the amount of increase 
being greater for Section C than for Section B. 

For sections based on intelligence, indi- 
vidual differences are relatively greatest for 
the lowest group and least for the highest. 
Individual differences increase for all groups, 
however, the greatest amount of increase 
appearing in the lowest and highest groups. 
For the sections combined, individual differ- 


ences increase. Although it is difficult to 
compare the results of this part of the test 
with those for Part I, it appears that the 
summer vacation tends to produce greater 
variation in individual differences in ability 
to explain scientific phenomena than in 
ability to recall factual information. 


3. Results of Tests Designed to Measure 
Ability to Draw Conclusions from Given 
Data 

in Table III is given the distribution of 
scores for all groups at both testings on Part 

lil of the test (consisting of a total of 42 

items*), that part designed to measure ability 

to draw conclusions from given data. Vari- 
ations in performance are relatively less pro- 


* The reliability of this part of the test, determined by the 
split-halves method, was .887 + .015. Each item consisted 
of a statement followed by five plausible conclusions, only 
one of which was correct. 


TABLE III 
SCORES ON FINAL EXAMINATION (JUNE AND SEPTEMBER), ParT III, ALL SECTIONS 
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nounced than for the other two parts of the 
test. For all groups fewer extreme cases 
occur than occurred in the distributions of 
scores for Parts I and II. 

Losses occur for Sections C, B, and A, the 
greatest absolute and relative loss being ex- 
perienced by Section C and the least by 
Section A, 
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TABLE IV 
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For the intelligence groups the lowest 
group experienced the greatest loss both ab- 
solutely and relatively, whereas the middle 
group lost least both absolutely and rela- 
tively. It is interesting to note that al- 
though the relative loss for the three sections 
combined is considerably greater for this part 
of the test than for the first two parts, the 


SCORES ON FINAL EXAMINATION (JUNE AND SEPTEMBER), Parts I, II, III 
COMBINED, ALL SECTIONS 
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relative loss for this part is only 0.02 per cent 
greater than that for the combined sections 
on Part I. Pix 

It is evident from the standard deviations 
that individual differences in performance 
vary widely for all groups on both testings. 
Relatively, the degree of individual differ- 
ences for the June examination is greatest for 
Section A and least for Section B, whereas in 
September the relative degree of individual 
differences is greatest for Section C and least 
for Section B. 

For the intelligence groups, individual dif- 
ferences in performance in June are most pro- 
nounced for the lowest group and least for 
the highest. All groups tended to become 
more heterogeneous during the vacation 
period, the degree of increase being relatively 
greatest for the middle group and least for 
the lowest. 

Whether these inconsistencies are due to 
the smaller number of items in the examina- 
tion or to the fact that intelligence and gen- 
eral ability have less influence on the ability 
to draw conclusions from given data than on 
the two other abilities measured can not be 
readily determined. Individual differences 
for the sections combined are much greater 
relatively in September than in June, the in- 
crease being considerably greater than that in 
the first two abilities. 


4. Comparative and Total Analysis of Results 


Table IV presents the June and September 
scores, with means and standard deviations, 
for Parts I, II, and III combined. For the 
general ability sections, the greatest absolute 
loss is suffered by Section B and the least 
by Section A. Relatively, however, Section C 
experiences the greatest loss and Section A 
the least. 

For the intelligence groups, the middle 
group shows the least absolute loss and the 
highest group the greatest. In terms of 
relative loss the lowest group forgot most and 
the middle group least. That the highest 
group is not the group which forgot the least 
is perhaps attributable to its much larger 
score in the June testing. 

When the groups are treated compositely, a 
loss in mean score of 23.49, or a relative loss 
of 14.16 per cent, is found. It may be seen 
that comparatively little of the ability meas- 
ured was lost between June and September. 

By examining the standard deviations it 
appears that for the sections grouped on gen- 
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eral ability the relative degree of individual 
differences in performance on the June test- 
ing is greatest for Section C and least for 
Section B. But in September variations in 
degree of individual differences are least for 
Section A and greatest for Section C. Both 
Sections C and B tend to become more hetero- 
geneous, Section C somewhat more so than 
Section B. On the other hand, Section A 
tends to become slightly more homogeneous 
after the intervenient period. 


For the intelligence groups, the greatest 
relative degree of individual differences in 
performance in June is shown for the lowest 
group and the least for the highest. The 
same condition obtains for these groups in 
September. Individual differences tend to 
increase for all groups during the vacation, 
the tendency being greatest for the lowest 
group and almost imperceptible for the 
highest. 

When the sections are combined, individual 
differences are more marked in September 
than in June, the trend toward increasing 
heterogeneity being similar to that for 
Part I. 


5. General Summary and Discussion 


The results of the study provide a basis for 
indicating several definite trends as follows: 

a. Marked variations in performance on 
both testings are noted for all groups on all 
three parts of the tests, these variations being 
more marked for Parts I and II than for 
Part III. 

b. Individual differences increase between 
testings for all groups and for all parts of the 
test. The increase in individual differences 
is greater for Part III than for Part I or 
Part IT. 

c. When the parts of the test are treated 
compositely, it is found that in the case of 
pupils sectioned on general ability the great- 
est actual amount of forgetting is experienced 
by Section B and the least by Section A. 
Relatively, however, Section C suffers the 
greatest loss and Section A the least. For the 
intelligence groups, the middle group shows 
the least absolute loss and the highest group 
the greatest. In terms of relative loss the 
lowest group forgot the most and the middle 
group the least. That the highest group was 
not the one which forgot the least is probably 
due to the much larger score which it made 
on the June testing. 
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The relative losses for the groups treated 
compositely are practically identical on Parts 
I and III. The highest degree of retention 
is found in the case of Part II. For these 
three parts of the test the percentages of loss 
are ranked in order as follows: Part III, 
17.42; Part I, 17.40; and Part II, 9.09. 
When both pupils and tests are treated com- 
positely, the percentage of forgetting during 
vacation is approximately 14 per cent. 

The relatively small amount of forgetting 
during the summer raises the question of the 

significance of these findings as compared 
' with those obtained from other investigations. 
Before comparisons are made, however, it 
should be pointed out that inasmuch as the 
pupils in the present study had been tested 
at regular intervals during the progress of the 
course, the Form B tests being used as the 
final examination, overlearning is undoubt- 
edly an important factor affecting the amount 
retained. The majority of investigations 
available have employed one of two methods: 
they have either determined how much was 
known at the end of a course and how much 
was known at definite intervals thereafter; 
or they have measured the gain (using initial 
and end tests) made in a subject or a course 
and determined how much of this gain was 
retained after definite intervals of time. 
Thus, in either case, there is only a limited 
number of testings. 

Investigations which measure outcomes 
other than factual information are limited, 
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but those available suggest strongly that 
ability .o apply principles, to explain phe- 
nomena, problem-solving procedures, and 
attitudes are retained over a long period with 
only slight loss. The findings of the present 
study, although not altogether comparable 
with those of other investigations, confirm 
these tentative findings and reinforce a com- 
monly accepted belief among educators that 
the permanent outcomes of teaching are to be 
found among the so-called intangible 
objectives. 
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MEASURABLE OUTCOMES OF TWO METHODS OF TEACHING 
EXPERIMENTAL GEOMETRY* 


Davin E. BROWNMAN 
Chairman of the Department of Mathematics, Murrayhill High School, New York 


THE PROBLEM 


To determine scientifically how the lecture- 
demonstration method compares with the 
individual-laboratory method of teaching, 
when geometric material is presented as a 
laboratory science, is the problem, The pro- 
cedure consisted of forming™ two parallel 
groups of pupils which were nearly alike in 
mental equipment as measured by those 
factors which were considered likely deter- 
minants for successful achievement in school 
subjects, and which were about at the same 
point on the learning curve in terms of 
achievement in geometry. The group which 
was taught by the lecture-demonstration 
method was designated the control group and 
the one taught by the individual-laboratory 
method was designated the experimental 
group. 


DEFINITION OF THE PROBLEM 


The geometric material was classified into 
five factors. These factors and their defini- 
tions are 


1. Descriptive Concepts—the fundamental 
definitions of the subject and its nomen- 
clature. 

. Experimental Concepts—geometric 
facts, such as the properties of a figure, 
spatial relationships of two related fig- 
ures, which are determined experimen- 
tally. 

. Skills—the usual geometric construc- 
tions with straight edge and compasses. 

. Applications—the practical use of the 
experimental concepts of geometry in 
industry and “boy scout surveying.” 

. Integrated Problems—problems involv- 
ing an integration of concepts and skills 
brought to bear upon their solution; the 
correlation of geometric principles or 
concepts of geometry, inductively deter- 
mined, in order to deduce other prin- 
ciples more special. 


* Abstract of a thesis submitted in partial fulfillment of 
the requirements for the Degree of Doctor of Philosophy in 
the School of Education of New York University, 1938. 


The content of the geometric material, 
which was taught in the course, consisted of 
go descriptive concepts, 100 experimental 
concepts, and 20 skills. The concepts and 
skills permeated the applications and inte- 
grated problems. Each concept and skill 
was assigned a serial number by means of 
which progress in achievement was traced 
during the experiment. 


DELIMITATION OF THE PROBLEM 


1. To both groups geometry was presented 
as a laboratory science. Only the de- 
scriptive concepts, experimental con- 
cepts and skills were taught. To the 
control group these three factors were 
taught by the _ lecture-demonstration 
method; and to the experimental group 
the descriptive concepts were taught by 
the lecture-demonstration method, while 
the experimental concepfS and skills 
were taught by the individual-laboratory 
method. 

. The problem was delimited to pupils in 
the oth and roth grades of the indus- 
trial high school. 

. No effort was made to equate the dis- 
tribution of time for the two groups, 
although the two procedures required 
approximately the same time. 


Speciric ASPECTS OF THE PROBLEM 


The following specific aspects of the com- 
parative efficiency of the two methods of 
teaching were investigated: 

1. Scores obtained on the various tests 

administered during the experiment. 

2. Gains in both immediate and remote 
achievement in the two factors, experi- 
mental concepts and skills. 

. Achievement of the pupils relative to 
the factors of applications and inte- 
grated problems in immediate and re- 
mote achievement. 

. Immediate and remote achievement 
relative to the factor of descriptive con- 
cepts which was taught to both groups 
by the lecture-demonstration method. 
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ORIGIN OF THE PROBLEM 


The problem arises only when geometry is 
considered an experimental science in which 
the two methods under investigation involve 
contrasting procedures in the teaching of the 
subject. ‘The same problem has arisen in the 
field of the natural sciences and has been in- 
vestigated generally in the same manner as 
was done in this study. It was first held by 
Perry in 1902 that geometry is “essentially 
an experimental science, . . . and should be 


. taught observationally, descriptively and ex- 


perimentally”. Even before Perry such 
pioneers as Hill in 1880 and Hanus in 1893 
attempted to introduce the experimental 
treatment of geometry into the United States. 
It was not until the appearance of the junior 
high school movement that experimental 
geometry under the name of intuitive geom- 
etry received a definite place in the curricu- 
lum of the schools. 


PROCEDURE 


The present experiment was conducted 
solely in the Murray Hill Industrial High 
School, New York City, which had as its 
chief purpose during the time of the investi- 
gation the preparation of boys after two 
years of study for entrance into a skilled 
occupation or trade, as contemplated by the 
provisions of the Federal Smith—Hughes Act. 
Upon admission to the school each pupil made 
a deliberate choice of his major activity which 
consisted of training in a definite trade. The 
daily program comprised three continuous 
clock hours of instruction in the trade shop 
work of the pupil’s choice, one and one-half 
hours in the mathematics, science and draw- 
ing related to the trade, and one and one-half 
hours in general academic subjects such as 
English, social science, and health education. 
Included in the experiment were pupils reg- 
istered in the architectural drawing, mechan- 
ical drawing, plumbing, and woodworking 
departments. These pupils reported to the 
investigator for instruction in mathematics. 
The pupils in each of the trade departments 
were divided into two sections. In Section 1 
were the pupils in the first semesters of the 
first and second years; and in Section 2 were 
those in the second semesters of the first and 
second years. Each section consisted of four 
classes, one from each trade department. The 
subjects of the control group were drawn 
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from Section 1 and those of the experimental 
group were taken from Section 2. 

At the beginning of the experiment the 
pupils in both sections were subjected to the 
Otis Self Administering Test of Mental Abil- 
ity, Intermediate Examination Form A, and 
the Henmon-—Nelson Test of Mental Ability 
Grades 3-8, Form A. The intelligence quo- 
tients of each pupil, as determined by the two 
mental ability tests, were averaged and a 
mean I.Q. was obtained. In addition to the 
mental ability or intelligence tests, the pupils 
in both sections were subjected to a geometry 
inventory test in order to determine the posi- 
tion of each pupil upon the learning curve in 
geometry, as well as to supply basic data 
from which achievement might be measured. 

By plotting the mean I.Q. of each pupil 
on the abscissa axis and the corresponding 
score secured in the geometry inventory test 
on the ordinate axis a scatter diagram was 
constructed which served to indicate graph- 
ically the most likely mates from each of the 
two sections on the basis of general intelli- 
gence and position on the learning curve in 
geometry. Thus, an individual in Section 1 
was matched or paired with one in Section 2. 
Fifty such pairs were secured. The fifty 
representatives from Section 1 composed the 
control group and a like number from Sec- 
tion 2 composed the experimental group. 

In addition to the mental ability tests and 
the geometry inventory test the two groups 
were subjected to intermediate and final 
achievement tests. The intermediate achieve- 
ment tests were four in number and were 
designated Tests A, B, C, and D, and each 
was administered during a 45 minute period 
on the day following the completion of several 
units of the subject which were within the 
scope of the test. The final achievement 
test, which was administered one month after 
instruction in geometry had ceased, consisted 
of two sections. Each section of the test was 
administered during a 45 minute period. 
Section 1 was identical with the geometry 
inventory test and Section 2 was devoted to 
additional concepts and skills which were 
taught during the course of the experiment 
and not included within the scope of the 
geometry inventory test. 


ORGANIZATION OF DATA 


The tests were constructed in the form of 
the new type short-answer items which could 
be scored objectively. Each item, except 
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those for integrated problems, involved either achievement tests. On the basis of these cor- 
a single descriptive concept, experimental rect responses to specific factors in the inven- 
concept, or skill, or an application of a single tory, intermediate, and final achievement 
experimental concept or skill. A record was tests, net gains in each of the factors in both 
kept of the correct responses to the items of immediate and remote achievement were 
the several geometry tests, and of the con- determined and summarized. 

cepts and skills and applications involved, for ' ‘ 

aan subject of the experiment. Accounting INTERPRETATION OF Data 

forms were prepared to enable the investi- The above data were treated statistically 
gator to identify and trace the correct re- with two distinct methods for determining 
sponses made to a specific factor type in the _ reliability of difference in any specific aspect 
geometry inventory, intermediate, and final of the problem. In one, the data pertaining 










TABLE I 


SUMMARY COMPARISON OF THE CONTROL AND EXPERIMENTAL GROUPS 
ON MEAN SCORES IN THE VARIOUS TESTS 


Control Experimental Approximate 
Test Mean o Mean o D op C.R. Chances 















Inventory 22. 72 9.12 23.08 8. 76 0. 36 1.79 0.20 58in 100 
ez A 26.62 10.95 22.36 9.42 —4.26 2.04 —2.09 98in 100 
36 B 20. 86 8.16 20.38 7.92 —0.48 1.61 —0.30 62in100 
EE C 17. 38 6.84 17.74 6.75 0. 36 1. 37 0.25 60in100 
=3 D 17. 86 7.74 20.74 7.38 2. 88 1. 52 1.90 97in100 
3 Section 1 45.20 16.65 53.40 18.30 8. 20 3. 50 2.34 99in 100 

= Section 2 12. 50 5.84 15.62 6. 16 3.12 1. 20 2.60 99in 100 
Total 57.98 21.42 68.30 25.80 10.32 4.74 2.18 99in100 














TABLE II 


SUMMARY COMPARISON OF THE CONTROL AND EXPERIMENTAL GROUPS AS TO MEAN Net CHANGE 
IN THE MASTERY OF DESCRIPTIVE AND EXPERIMENTAL CONCEPTS, SKILLS, APPLICATIONS, 
AND INTEGRATED PROBLEMS AS SHOWN IN THE FINAL ACHIEVEMENT TEST 











COMPARED WITH ALL PRECEDING TESTS 






Control Experimental Approximate 
Factor Mean o Mean o D Sp c. 2 Chances 
be D 13.30 10.90 19.90 12.85 6. 60 2.39 2.76 100 in 100 
3 E 1.32 6.12 6. 54 6.51 5.22 1. 26 4.14 100in100 
, s 2.08 3.30 3.14 3.35 1.06 0.67 1.60 94in100 
: A 2.60 1. 66 3.22 2.24 0.62 0.39 1. 60 94 in 100 
COMPARED WITH THE GEOMETRY INVENTORY TEST 
Control Experimental Approximate , 
Factor Mean o Mean o D oy C. R. Chances 
D 14. 94 10.20 19.90 11. 04 4.96 2.1% 2.34 99 in 100 
E 4.68 3.25 6. 06 3. 53 1.38 0. 68 2.03 98 in 100 
Ss 2.42 1. 92 2.84 2.12 0. 42 0.40 1.05 85 in 100 
A 0. 86 1.00 0. 1.11 0. 06 0. 23 60 in 100 









Experimental Approximate 







Factor Mean o Mean c D op C.R. Chances 
D 1. 02 1. 07 1. 04 1. 26 0. 02 0.24 0. 08 54 in 100 
E 6. 62 3.01 8. 54 3.15 1. 92 0. 62 8.10 100in100 
Ss 1. 64 1.38 1. 68 1.26 0. 04 0.27 0.15 56 in 100 
A 1.74 1. 34 2.30 1.69 0. 56 0.30 1. 87 97 in 100 
I 1.48 1.42 1. 90 1. 45 0.42 0.29 1.45 93 in 100 
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to the groups were treated as uncorrelated 
series and in the other as correlated series. 
The difference between the means of the two 
groups in each aspect of the problem, stand- 
ard error of the difference between the means, 
critical ratio, and approximate chances in 100 
that the true difference is greater than zero 
were calculated. Tables I and II present 
pertinent data. These tables indicate re- 
spectively the scores obtained in the several 
tests and the changes in the three categories 
of remote achievement with respect to the 
factors of the experiment. 


DIGEST OF THE FINDINGS 


The statistical measures obtained, when 
the data were treated as uncorrelated series, 
did not vary materially from those which 
were obtained when the data were treated as 
correlated series. In some instances the sta- 
tistical measures were exactly alike. The 
findings were, therefore, summarized without 
reference to which statistical method had 
been used. 

The following are the summarized findings: 


1. During the course of the experiment, 
as revealed by the intermediate achieve- 
ment tests, a transition occurred from 
a trend in favor of the lecture-demon- 
stration method to one in favor of the 
individual-laboratory method in all as- 
pects of the problem. The latter trend 
is marked by substantial differences 
statistically significant in favor of the 
individual-laboratory method with re- 
spect to scores, descriptive concepts, and 
experimental concepts. With reference 
to the skills and integrated problems, 
the differences, while in favor of the 
individual-laboratory method, were 
small and statistically insignificant. 

. The above trend in favor of the indi- 
vidual-laboratory method in terms of 
scores and net changes in the several 
factors continued throughout the two 
sections of the final test. The differ- 
ences in favor of this method were sig- 

nificant in most aspects; and their 


to 
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standard errors demonstrate with prac- 

tical certainty the superior achievement 

resulting from the individual-laboratory 
method in those aspects. By means of 

Section 1 of the final test the relative 

merits of the two teaching procedures 

may be gauged under optimum condi- 

tions, since, as was previously indicated, 

this test was identical with the geometry 
inventory test which was one of the 
criteria employed in equating the two 
groups. 

3. In the differences of scores and net 
change in experimental concepts, as in- 
dicated in Section 2 of the final test, 
the superiority of the individual-labo- 
ratory method is again shown with 
somewhat practical certainty. 

4. The differences in favor of the individ- 
ual-laboratory method obtained for the 
net changes in skills, applications and 
integrated problems in both sections of 
the final test and that for net changes 
in descriptive concepts in Section 2 of 
the final test are too small to attach any 

significance to them. 


CONCLUSIONS 


It may, therefore, be held that the indi- 
vidual-laboratory method is definitely supe- 
rior to the lecture-demonstration method with 
respect to test scores and experimental con- 
cepts. With reference to skills, however, the 
superiority of the  individual-laboratory 
method is not as marked, as indicated above. 
if not somewhat doubtful. There is not 
enough statistically significant evidence to 
enable one to lay claims to the superiority 
of either method when the factors of appli- 
cations and integrated problems are consid- 
ered. With respect to comparative achieve- 
ment in the factor of descriptive concepts, 
which was taught to both groups by the 
lecture-demonstration method, it may be held 
with some degree of certainty that the indi- 
vidual-laboratory method with the experi- 
mental concepts and skills appears to have 
exerted a favorable influence. 
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CONSTRUCTION, INTERPRETATION, AND USE OF A SIGHT 
READING SCALE IN ORGAN MUSIC WITH AN ANALYSIS 
OF ORGAN PLAYING INTO FUNDAMENTAL ABILITIES 


Tueo. G. STELZER 


Concordia Teachers College, 
Seward, Nebraska 


Due to the significant progress made by 
psychologists in the analysis of abilities in 
reading, numerous diagnostic tests and reme- 
dial techniques are now available to teachers 
in the modern schoolroom. In the field of 
music, however, the lack of reliable and valid 
criteria of likeability and of difficulty seri- 
ously retarded the development of reading 
scales. The following study indicates how 
these obstacles to progress in musicology may 
be overcome by the application of certain 
psychometric methods.* 


Prior to the construction of materials to 
be used in the process of developing a sight 
reading scale, it was necessary to arrive at 
objective criteria of likeability and of diffi- 
culty. Here the criterion of use in the field 
was applied. Through the cooperation of 
organ students and graduates of Concordia 
Teachers College, Seward, Nebraska, and 
River Forest, Illinois, it was possible to secure 
468 samples of organ music best liked and 
within the playing ability of the persons sub- 
mitting them. The questionnaire was so ar- 
ranged that an individual’s complete reply 
made available for study: (1) the least and 
(2) the most difficult organ compositions 
within the group of pieces best liked by him, 
and (3) those measures judged by him to be 
most difficult, as well as (4) other measures 
best liked by him in each of the two selec- 
tions chosen. 


It was to be expected that certain compo- 


sitions would be named more than once. Of 
the 468 samples received, 159 were found to 
be duplicates of other selections. After these 
were eliminated there remained 309 different 
compositions to be treated by certain psycho- 
metric methods to obtain the needed informa- 
tion on difficulty and likeability in organ 
music. 
The complete study is on file in the office 4 Dr. W. 


. M. 
Petry and in the lib of the University of Nebraska, and 
is an abstract of a Ph.D. dissertation. 
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OrRpDER OF DIFFICULTY 


For the purposes of properly grouping so 
large a number of compositions, two methods 
were deemed most suitable. The first of 
these methods consisted in sorting the 309 
different items into twelve stacks separated by 
apparently equal-appearing intervals of diffi- 
culty. Then stack 1 contained compositions 
judged to be easier than those in stacks 2 to 
12, and stack 1o contained those items which, 
in the opinion of the judge, were more diffi- 
cult than all selections in stacks 1 to 9, but 
less difficult than those placed in groups 11 
and 12. The twelve resulting stacks were 
regarded as representing twelve levels of 
difficulty. 


Four competent judges ranked these com- 
positions individually at different times and 
places. Furthermore, the compositions were 
so arranged that they came to the attention 
of each judge in the same order in which they 
first arrived in the mail. After each judge 
had completed the sorting, the results were 
tabulated by recording the identification num- 
ber that had been assigned to a composition, 
as well as the stack number to which it had 
been assigned. Table I shows the number of 
selections in each of the twelve levels of diffi- 
culty as obtained from the average of the four 
judgments. 

By following the numbers from left to 
right, the reader may notice the consistency 
with which the middle stacks have frequen- 
cies larger than those of the end stacks. This 
was true, also, of the tabulations of individual 
judges. It is interesting to note the same 
tendency in the 159 duplicates. Inasmuch as 
their level of difficulty was derived from the 
stack value of the one identical representative 
copy in the group of 309 compositions, and 
since neither the persons who submitted the 
duplicates nor the judges could have con- 
trolled the resulting frequencies in each stack, 
the distribution seems all the more remark- 
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TABLE I 
FREQUENCY OF CERTAIN COMPOSITIONS AS SORTED INTO TWELVE LEVELS OF DIFFICULTY 
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Levels 1 2 3 4 6 7 s 9 10 11 12 
309 Items ___---- 11% 20% 387% 386% 40% 50% 37% 24 19 18% 9 4% 
159 Items ------- 1 1% 5% 8% 11% 39% 21% 29% 11% 21% 6% ° 


468 Items -_----- 12% 21% 438 





able. Since the absence of visibility precluded 
the possibility of a constant error resulting 
from end effects, an absolute tendency toward 
a normal distribution is indicated. The use 
of a second method indicated the same 


. tendency. 


A second method of ranking was used to 
validate the findings of the first and to secure 
quantitative scale values of difficulty for each 
of the items studied. By a modified use of 
the “Method of Paired Comparisons”, each of 
the four judges again sorted the 309 selections 
into their respective order of difficulty. Since 
all selections were placed in positions adjacent 
to each other, any one composition was 
thereby judged to be more difficult than the 
one which preceded it, and less difficult than 
the one that followed it. Thus each selection 
had been compared with every other selection 
and placed accordingly, and since the rank 
order given to the separate items by each of 
the four judges had been tabulated, it was 
possible to find the percentage of the propor- 
tion of ranks greater than the composite 
standard Pa,>cs by the use of J. P. Guil- 
ford’s formula: 

sr, — .5N 
PR>cs= ee 


Since there were 309 items, the numbers 
designating the rank order ranged from 1 to 
309. Then 3r; is the sum of the sum of the 
rank positions assigned to a given item by 
each of the four judges, N is the number of 
judges (4), and m is the number of ranks 
(309). Since the exact use of the method of 
paired comparisons would have required each 
selection to be compared with itself, the nu- 
merator of the formula contains the correction 
(—.5), which supposes that a judge would 
have ranked a selection more difficult than 
itself in fifty per cent of the judgments. Since 
the number of judges and the number of items 
remained constant, the numerical values for 
—.5N and mN were —2 and 1236, respec- 
tively, in all cases, while =7,; varied accord- 


* Joy Paul Guilford, Psychometric Methods, pp. 250-251. 
New York McGraw-Hill Book Company, 1936. 


45% 52 





90 58% 53% 30% 39% 15% 65. 








ing to the sum of the rank orders assigned to 
each item by the judges. The case of the 
most difficult item is given here to illustrate 
the use of the formula. This selection was 
consistently rated at 309. By substitution 
the formula then reads 


(309 + 309 + 309 + 309) — (.5 K 4) = 
1236—2 1234 
1236 1236 





= .9984 


Thus .9984 was found to be the per cent value 
of the most difficult item. In the same man- 
ner the per cent values were found for each 
of the 309 compositions. Since the sigma 
value, however, could be read from a table’ 
only if these per cent values were stated in 
terms of distances from the mean, fifty per 
cent, it was necessary to subtract fifty alge- 
braically from each per cent value. Thus the 
value .9984 — .50 = .4984, and, according 
to the table,® it fell at a point 2.94 sigma 
above the mean. Since the item least diffi- 
cult in the opinion of the judges fell at —1.97 
sigma, the entire scale of 309 compositions 
represented a range from —1.97 sigma to 
+-2.94 sigma, or, expressed in positive values, 
from .0o sigma to 4.91 sigma. Table II shows 
the distribution of items upon various levels 
of difficulty. 


Obviously, the method of rank order cor- 
roborated the findings obtained by the first 
method of sorting into twelve levels of diffi- 
culty. Therefore, it was safe to adopt this 
scale of difficulty, consisting of 309 different 
compositions, as a pattern in composing the 
new scale. 


RANGE OF PROBABLE INTEREST IN 
OrcAN Music 


Since each individual included in this study 
was represented by his choice of the least 
difficult and the most difficult of organ com- 
positions best liked by him, and since the 
mean stack value and the mean sigma value 
had been computed for each item, it was pos- 
sible to state the difference between his least 
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p. 91. New York: Green and , 1930. 
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READING SCALE IN ORGAN MUSIC 


TABLE II 
FREQUENCY OF CERTAIN COMPOSITIONS AS GROUPED INTO TWENTY LEVELS OF DIFFICULTY 
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difficult and his most difficult selection in 
magnitudes of stack differences and of sigma 
differences. Furthermore, since these two 
compositions expressed for a given individual 
his avowed range of interest, the sum of these 
numerical differences divided by the number 
of pairs gave the arithmetic mean, or average 
range of interest manifested by the group that 
contributed to this part of the study. 


According to the first method of sorting the 
music into twelve stacks with equal appear- 
ing intervals of difficulty, the greatest possible 
distance between the least and most difficult 
pieces could have been eleven stacks. How- 
ever, for the entire group the mean distance 
was found to be 2.13 stacks, and this was only 
nineteen per cent of the possible range. Fur- 
thermore, the sigma value obtained by the use 
of the second method ranged from .oo sigma 
to 4.91 sigma; thus, the greatest possible dis- 
tance between the two items of an individual 
could have been 4.91 sigma. The mean dis- 
tance so found, however, was .77 sigma. Since 
this, too, was less than sixteen per cent of the 
possible range, we may conclude that the 
range of a given person’s interest in organ 
music comprises a comparatively limited 
range in organ literature. Another important 
conclusion for the teaching, the composing, 
and the publishing of organ music may be 
drawn from this fact. Since the range of in- 
terest proves to be so limited, it will, prob- 
ably, be futile to expect organists or students 
to like pieces which are either too easy or too 
difficult for them to play. Therefore, if a 
group of compositions is to appeal to ad- 
vanced players and to beginning organists, 
the factor attracting interest must be one 


other than that of ease or difficulty of 
execution. 
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CRITERIA OF LIKEABILITY 


Three characteristics were found to be com- 
mon to the best liked measures of the selec- 
tions submitted, namely, melody, smooth 
progression, and enriched harmonies. A given 
composition was said to meet the first cri- 
terion of melody if it contained what to us 
seemed to be a tuneful, melodious theme, not 
necessarily familiar. Of the 468 samples of 
best liked measures that were included in this 
study, all 468 were found to meet this cri- 
terion of melody. It is worthy of note that 
in not one instance had any one copied a mere 
succession of impressionistic chords as a group 
of best liked measures. Chorale motives were 
found forty times, and all cases contained 
melodious themes or motives. 

Moreover, a composition was judged to 
have satisfied the criterion of smooth progres- 
sion if the movement of voices was predom- 
inantly diatonic or chromatic. Again, all 468 
samples were found to meet this criterion of 
smooth progressions of harmony or of coun- 
terpoint. 

The criterion of enriched harmonies was 
met if the selected four best liked measures 
contained one or more notes foreign to the 
prevailing key. Since fifty-seven examples 
were found in which this criterion did not 
apply, it is evidence that music to be well 
liked need not of necessity have enriched har- 
monies. It is of interest, however, that in one 
case only both the least difficult and the most 
difficult best-liked selections of the same in- 
dividual did not embody enriched harmonies. 
Furthermore, the fact that enriched harmo- 
nies were chosen as best-liked fifty per cent 
of the time by fifty-five persons and in one 
hundred per cent of the choices of the remain- 
ing 178 persons is evidence sufficient for the 
purposes of this study that this was a third 
criterion of likeability. 
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COMPOSING AND TESTING THE NEw SIGHT 
READING SCALE 


In agreement with the criteria of likeability 
and the scale already discovered, the writer 
composed fifty-eight compositions embodying 
twenty-six different chorales or hymn tunes 
and covering sixty-five mimeographed pages 
of legal size paper. Six of these numbers, 
representative of the entire scale of difficulty, 
were played before ninety organists who rated 
them for likeability on a five-point scale. It 
has been shown that the average range of in- 
terest manifested in the selections from the 
field extended over less than twenty per cent 
of the total range. It now remained to be 
seen whether incorporation of the three dis- 
covered criteria of likeability would mate- 
rially increase the range of interest. It was 
found that 80.9% showed positive like, 
13.7% were undecided, and only 4.5% of the 
choices indicated tendency to dislike, while 
0.9% showed positive dislike. This was evi- 
dence that the criteria of likeability were 
applicable and adequate for this study. 


To ascertain the preference as to size of 
type to be used in the scale, two specimens 
of a composition were prepared on paper 
8%” x14”. The lines of the staff of the first 
specimen page were 2 mm. apart while the 
lines of the staff in the other were 3 mm., or 
one-eighth of an inch, apart. According to 
the Method of Choice, twenty subjects 
selected that piece which they would prefer 
to read at sight. Since there was only one 
who preferred the smaller type, and since it 
was the judgment of this experienced artist 
and piano teacher that beginning organ stu- 
dents would, no doubt, prefer the larger type, 
it was decided that the larger type should be 
used in this experiment. 

Five subjects of varying ability in organ 
playing then read the entire collection of com- 
positions. The errors were marked to locate 
those parts which caused the reader greatest 
difficulty. Since the errors tended to fall with 
regularity at given places, and with the sam- 
ples submitted from the field indicating the 
same trend, it was assumed that for accept- 
able playing, a given piece was to be consid- 
ered as difficult to read as its most difficult 
part. Therefore, those measures in which 
errors were most frequent were taken as ex- 
cerpts from that composition. These selec- 
tions of approximately four measures, not ex- 
ceeding eight inches in actual length, were 
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then mounted upon 4” x 8” cards to be used 
as reading test items. Accordingly, about 
two such excerpts were made from each page, 
resulting in the choice of 136 samples. 


With the assistance of the first five sub- 
jects, these items were so arranged that a pair 
represented approximately the same level of 
difficulty and, also, required nearly the same 
technique in playing. Only fifty-four com- 
parable pairs were discovered. Therefore, 
fifty-four items were selected as the hypothet- 
ical reading scale to be used in further ex- 
perimentation. Additional items could then 
be selected from the second group if this 
should become necessary. Eleven such items 
were later added to make a more even scale. 


Since the readings of the first five subjects 
indicated: (1) that the greatest improvement 
seemed to result from the first to the second 
reading, (2) that the third reading eliminated 
few errors, and (3) that further attempts 
would be futile and, probably, detrimental 
unless guidance and correction were given, it 
was determined to require three consecutive 
readings of each test item by each subject. 

Errors were defined in terms of notes, time, 
and rhythm. Notes occurring simultaneously 
in a vertical plane constituted one chance for 
error. The time in which such notes or chords 
were to be struck constituted the second 
chance for error. The number of chances for 
error then amounted to twice the number of 
successive notes or chords and complete rests 
in each test item. Thus a chord of six notes 
was judged wrong if any one or more notes 
were incorrect as to either pitch or time. The 
notes occurring in a vertical plane were 
judged as only one chance for error since a 
chord may facilitate reading in one person 
and inhibit reading in another. Furthermore, 
if a rest was not observed, an error of time 
was committed; but, if a note was played dur- 
ing such time for rest, the result was an error 
of pitch rather than an error of time. The 
subject was encouraged to read in a con- 
venient tempo. If he began unreasonably 
fast, he was asked to play more slowly. No 
error was charged for fingering, pedaling, reg- 
istration, or expression. If the pedal notes 
were played with the hand, the subject was 
asked to play with the feet but no error was 
charged. Finally, if the total rhythmic pat- 
tern was broken, it constituted an error. 

The rank difference method of computing 
the correlation (rho) between various read- 
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ings was frequently applied in order to meas- 
ure the reliability of the procedure adopted. 
Correlations between the number of errors 
committed in various readings of each test 
item were computed after the three readings 
were completed by nine, sixteen, and twenty- 
one subjects, respectively. Since the corre- 
lations were above .90 consistently, and 
mostly from .96 to .gg, it was decided to ter- 
minate the study after thirty subjects had 
completed three readings of each item. 


The criterion of errors had proven reliable. 
It was possible to derive a criterion of suc- 
cesses from the same data. The proportion 
of successes was, therefore, figured with six 
different definitions, or criteria, for success: 
namely, when the errors in an individual’s 
reading of a test item did not exceed zero, 
one, two, three, four, or five, respectively, the 
reading of this item was counted a success. 
In each of the six cases, the proportion of suc- 
cesses was translated into a sigma value. 
Table III shows the intercorrelations: (1) be- 
tween successes and successes, as judged upon 
six different criteria, (2) between these suc- 
cesses and the errors made in various read- 
ings, and (3) between each of the several 
readings according to the errors recorded. 
These seventy-eight intercorrelations formed 


the basis for the conclusion that the proposed 
criteria of errors and successes were reliable. 

Notwithstanding these high correlations, it 
was advisable to remove all doubt about the 


existence of a physical scale. Did the addi- 
tion of a note, chord, or rest per se add to the 
difficulty of the scale? Then the number of 
chances for error would be increased propor- 
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tionally by and because of each added note, 
chord, or rest on the horizontal plane. In 
that case, the percentile rank of difficulty for 
each item could be obtained by dividing the 
total number of actual errors by the number 
of possible errors times the number of sub- 
jects times the number of readings per sub- 
ject. This was done. By correlating the 
perceatile rank of each item with the rank 
order obtained from the sum of the errors of 
the three readings of thirty subjects, rho was 
found to be .78. In comparison with the 
amounts of the correlations in Table III this 
seemed low. Furthermore, the test item 
which constituted the eighth level of the final 
scale of ten steps and which had, in the total 
series of sixty-five, ranked fifty to fifty-three 
according to the errors and successes, now 
ranked only nineteen according to the per- 
centile rank. In this particular item, the 
first four notes constituted a “figure” derived 
from the tonic chord in G major which might 
have occurred as a single chord of four notes. 
The fact that the four notes occurred on four 
perceptible beats and not upon one beat as in 
a chord, made more errors possible although 
not probable. Therefore, the difficulty of 
reading notes depends not alone on the num- 
ber of chances for error but upon relations of 
meaning or of behavior patterns. One stim- 
ulus as a whole is more or less difficult than 
another, and a group of notes falling in a cer- 
tain hand position may be more easily read 
because of such grouping. Therefore, the 
reading scale was plotted upon the basis of 
errors and successes as the most reliable and 
valid procedure. 


TABLE III 


INTERCORRELATION OF FiFTy-Four Music ITEMS ACCORDING TO ERRORS AND SUCCESs#S MADE BY 
THIRTY PERSONS IN VARIOUS READINGS 


1 2 3 4 
Errors of Ist reading-_._ _... .967 .964 .985 
Errors of 2nd reading.. .967 __.. .986 .993 
Errors of 3rd reading __ .964 .986 ____ .988 
Errors of readings 
1-2-3 ; - F 
.979 .997 


.996 .994 


. 993 .979 
.987 .995 
.979 .995 
.997 .994 


O66 ..-.. 


5 6 7 8 9 
971 . . 942 
.999 . . 948 
. 993 . . 954 
. 989 . 9% 


. 985 . 


10 

. 940 
. 974 
. 981 


11 

. 971 
. 982 
. 978 


. 988 


12 


. 979 
. 976 
. 969 


. 982 


13 


. 967 
. 946 
. 931 


. 957 


Least errors per person. . 
Zero errors= a success. . 
0 to l error=a success . 
0 to 2 errors=a success . 
0 to 3 errors=a success . 
0 to 4 errors=a success . 
0 to 5 errors=a success . 


-993 .989 
. 929 .929 
. 954 . 947 
-981 .971 


-981 .993 

. 922 .940 . 
-936 .955 . 
.964 .975 . 
-976 .983 . 
.977 .975 . 
.972 .939 . 
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The errors made in the readings having 
been obtained and the proportion of successes 
reduced to sigma values, it was possible to 
plot on a pair of coordinate axes the sigma 
values of the successes as the abscissa and the 
number of errors as the ordinate. Since, in a 
perfect correlation, the items fall into a 
straight line when plotted, those items falling 
most nearly into a straight line in Figure 1 
were selected to form the graduated scale in 
sight reading. Since the items so selected 
formed a scale when scored either according 
to errors or successes, a psychological sight 
reading scale existed. It was found that the 
criteria of success with as high as three and 
four errors most nearly approximated a 
straight line; however, the criterion which as- 
sumed that the occurrence of four errors or 
less in one reading of a given stimulus con- 
stituted a successful reading of that item, had 
the further statistical advantage that four was 
the approximate mean of all the errors. The 
ninety readings of sixty-five items by thirty 
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subjects had resulted in a grand total of 
23,912 errors; therefore, 
23912 23912 
M= = = 4.08 
65X90 5850 > 





While the degree of difference between suc- 
ceeding steps of the scale is not identical in 
each case, Table IV shows in quantitative 
terms the extent of uniformity approximated 
in the spacing. 


From all data available it is reasonably 
certain that players will have progressively 
more errors and fewer successes in this scale 
as they approach and exceed their level in the 
reading of organ music. 


It will be seen from Figure 3 that the read- 
ing of the sixty-five items classified the sub- 
jects into ranks of ability. Thus, also, the 
new scale places a reader upon his particular 
level. Furthermore, Table V shows the high 
consistency with which individual subjects 
maintained their level in various readings or 
combinations thereof. 
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READING SCALE IN ORGAN MUSIC 


TABLE IV 


DISTANCE FroM ONE ITEM OF THE SCALE TO THE NEXT IN TERMS OF DIFFERENCES IN ERRORS 
MADE IN VARIOUS READINGS AND OF DIFFERENCES IN SIGMA VALUES 


Differences in Terms of: 1-2 23 
Sigma values 0 to 4 errors a success_..42 31 
Sum of errors in all readings 99 86 
First and second readings 58 
First reading $1 
Second reading 27 
Third reading 28 
Least errors of three readings 24 


200 f 
iso [ 
ieee 





7 


50 
\sujedS. te dese PEYYPULEPVEV ITE LEE 1 LT tiie 


Figure 3. Distributi.n of Subjects on Terms of 
Total Number of Errors de ixThxee 
Readings Each of Sexty-Five Music Items 


TABLE V 


INTERCORRELATIONS OF THIRTY PERSONS 
ACCORDING TO VARIOUS READINGS OF 
CERTAIN Music ITEMS 


Errors Committed in: 1 2 3 4 5 
1. First reading of 
sixty-five items__ _- 
Second reading of 
sixty-five items __ 
Third reading of 
sixty-five items _- 
All three readings, 
sixty-five items __ 
. All three readings 
ten items of scale 


. .967 .979 .985 .979 
.967 .... .992 .993 . 967 
.979 .992 _... .993 .965 
.985 .993 .993 _... .976 
.979 .967 .965 .976 ___- 


The assumption that the new reading scale 
of ten levels is both valid and reliable was 


further established by the close agreement be- 
tween the scale values of the least and the 
most difficult best-liked pieces originally sub- 


Items of the Sight Reading Scale 
84 45 


Mean 
7-8 89 9-10 Distance 
47 81 39 86.48 427 
140 120 126 199 116.9 
90 889 85 140 80.5 
55 ~=—s« 60 31 80 43.0 
35 29 54 «60 37.6 
50 = 31 41 53 35.7 
56 22 47 57 35.1 


5-6 6-7 


mitted by, and the placement in the actual 
reading of the new sight reading scale of, 
twenty organ students who took part in both 
phases of this study. It is significant that ten 
of these persons were tested by three different 
examiners in River Forest, Illinois. 

If, at a given time, individuals have been 
found to represent each level of this scale, 
they may be used as laboratory subjects to 
classify organ music which is new to them. 
Of all who read a new composition acceptably 
at sight, the one lowest in rank on the scale 
determines the difficulty of that piece. His 
scale number is assigned to it. In a similar 
manner, among those who read the piece ac- 
ceptably at sight and /ike it, the persons low- 
est and highest on the scale determine by 
their scale position the probable range of 
likeability for that piece. 

The guidance value of this scale may be 
seen from the following: For an individual, 


1. If the sum of his errors in the three 
readings of one stimulus item equals 
zero, mastery of that level of difficulty is 
indicated. This mastery zone offers 
music which may safely be attempted 
in sight reading before an audience. 

. If the sum of his errors in the three 
readings equals or is less than four, it is 
a zone of safety for private reading. 
With proper guidance or concentration, 
mastery should soon be possible for this 
individual. 

. If the sum of his errors in the three 
readings is greater than four, but his 
errors in any one reading equal or are 
less than four, we have a zone safe for 
practice and study. 

. If the number of errors in each of the 
three readings exceeds four, the stimulus 
exceeds the upper limen in the reading 
ability of that subject. It is too diffi- 
cult for him to read at sight at that time. 
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5. Thus, an individual’s proximity to a 
given zone is indicated by the extent to 
which his errors exceed or are less than 
four. 


Since the ten selected test items have a 
total of forty and one-half measures ranging 
from three to six and two-thirds measures per 
item with a mean of 4.05 measures, we may, 
in general, accept one error per measure as a 
crude criterion for measuring a student’s abil- 
ity to read organ music at sight. If, there- 
fore, he has one error or less per measure in 
the reading of a given selection at sight, he 
should be able to master the composition with 
a reasonable amount of practice. If, however, 
the errors exceed this number, the composi- 
tion will probably require considerable study 
before mastery can be achieved, and the stu- 
dent should be encouraged to study easier 
music until he has reached a higher degree of 
achievement. 

The study concluded with an analysis of 
organ playing into eight fundamental abil- 
ities, derived from the errors made by the 
subjects and by careful observation and study 
of their individual difficulties and powers. 
These abilities were arranged in a chart which 
is here given. 


CONCLUSIONS 


This study warrants the following con- 
clusions: 
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. For an individual, the range of interest 


in organ music is limited to a compara- 
tively narrow range of difficulty. 


. This range of interest in organ music 


may be considerably increased if the cri- 
teria of likeability are employed, namely, 
melody, smooth progressions, and en- 
riched harmonies. 


. Beginning organ students prefer a large 


size staff and notes. 


. The number of errors made in time and 


in pitch is a reliable and valid criterion 
in grading the difficulty of reading organ 
music at sight. In general, if the num- 
ber of errors in reading exceeds one for 
each measure, the composition is still 
too difficult for reading purposes. 


. The greatest improvement in reading is 


noticeable from the first to the second 
reading. If errors persist through the 
third reading, information and guidance 
are needed. 


. Fluent reading habits are best acquired 


when the music is within the player’s 
range. Music that is too difficult tends 
to cause undesirable reading habits or, 
even, disintegration of control. 


. The sight reading scale developed in this 
study is a reliable and valid measure of 
ability in the reading of organ music. 
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THE EFFECT OF BUDGETING TIME ON THE ACHIEVEMENT 
OF FRESHMEN NORMAL SCHOOL GIRLS* 


Bess E. JOHNSON 


State Normal School 


A large number of American people in gen- 
eral, as well as the young people who enter 
colleges and normal schools each year, seem 
to be thoroughly convinced that all people 
are born free and equal. After careful con- 
sideration, the falsity of this supposition be- 
comes apparent. Differences in quantity and 
quality of bodily structure, health and ambi- 
tion, capacity and aptitude for learning, and 
knowledges and skills developed are conspic- 
uous in all individuals. ‘The most important 
educational problem, however, does not con- 
sist in determining more facts about these dif- 
ferences in native endowment or devising 
more practical and reliable methods for their 
measurement, but in finding a way of helping 
each student learn to make the best possible 
use of the talents and the energy and the time 
which he possesses.’ 

“The heaviest responsibility carried by any 
person is that of investing the twenty-four 
hours a day which are allotted to him.’”” If 
young people can be taught to realize that 
achievement in any line of endeavor is vitally 
conditioned not only by capacity and aptitude 
but also by how they use each period of the 
twenty-four hours, much of their time and 
energy might produce more satisfying results. 

A large number oi studies concerned with 
the effect of intelligence, praise and censure 
as an incentive, reading ability, rivalry, extra- 
curricular activities, as well as other factors 
have been made. Comparatively few studies 
have dealt with the effect of a daily schedule 
for study, recreation, eating, personal care, 
and rest on scholastic achievement. 

Book® found that after giving instruction 
to students on how to make a plan for the 
use of all available time, the percentage of 
efficiency for one group rose from 76 to 96 

* Field Study No, 1, Colorado State College of Education, 
Greeley, Lay 


iw. Book, eg ‘te Set in College, p. 22. Balti- 
more: Wareick “and York, 


7L. A. Headley, How Ay cats > fame p. 377. New 
York: Henry H t and Company, 19 
*W. F. Book, “Results Obtained om a ‘Special H ow to St 


Course Given to College Students,’’ School and Society, XI 
(October 22, 1927), 529-34. 


Geneseo, New York 
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per cent, and in another group it rose from 
84 to 98 per cent. He states that “efficient 
planning and the ability to carry out one’s 
plan is, in fact, the most important single 
factor in successful work of every sort.’ 

Bird,® Headley,® Kitson,” KleinSmid,* Mor- 
gan,® Payne,’® Shreve,’ Werner,’? and 
Whipple** have recognized budgeting of time 
as a significant factor in obtaining results 
from studying, but report no experiments to 
check this statement. 

The questionnaire method has been em- 
ployed by a number of investigators to deter- 
mine how the twenty-four hours of the day 
were used by college and high school students. 

In a study made at the University of Idaho, 
Goldsmith and Crawford'* found that the 
average amount of time spent on studying 
was three hours per day. This is about two 
hours less per day than that found in similar 
studies made by Comstock,’® Farrell,** and 
Hutchinson and Conrad.** Sturtevant and 
Strang** found that in a group of ten superior 

* Ibid., p. 532. 

5 Charles Bird, Effective Study Habits. New “ork: The 
Century Company, 1931. 

*L. A. Headley, op. cit., Chapter XIV. 

"H. D. Kitson, How to Use Your Mind. Philadelphia: 
J. B. Lippincott Company, 1916. 

*R. B. von KleinSmid and F. C. Touton, Effective Study 
Procedure in Junior College and Lower Division Courses. Los 
“a Lg ae of Souther ry; i. ™ a 

r 

“pes i, * i saint, “shee York: The millan 
Company, 1934. 

“~10W. L. Payne, “Methods in Teaching How to Study,’ 
School Review, XEXVIT October, 1930), 598-604. 

41 Francis Shreve, The Supervised Study Plan of Teaching. 
Richmond, Virginia: + B, Publishing Company, 1927. 
20. H. Werner, EZ College Student’s Problems, Chapter 


II, “The Wise Use of ort, and Money.” New York: 
Silver, — and Company, 1929. 


3G, , How to Study Effectively. 
Illinois: Public ae Company, 1927. 

%* A. G. Goldsmith and C. Crawford, “How Stu- 
dents Spend Their Time,” Schsct ond’ Society, XXVII (March, 
1928), 399-402. 

% Alzada Comstock and the Girl,” School 
and Society, XXI Charch. 18 1925), 326-27. 

% Frances V. Farrell, , “Time tures of Teachers Col- 
lege Education ors.” U Master of Arts Thesis, 
Colorado State Teachers ly Colorado, 1934. 


Greeley, 
* Ruth Hutchinson and Conrad, “What’s in a College 
_ School and Society, V (December, 1926), 768- 


ve Geek Sturtevant and Ruth S bi of Twenty- 
Four Hour Schedules of Forty ee Eo Teach- 


High 
ers College Record, XXviit ¢ (June, 1927), 994-1010. 
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high school girls the average amount of time 
devoted to study was two hours and a half 
per day. Book’* found college students de- 
voting considerably more time to study. The 
median number of hours per week was 34.30. 

Crawford” attempted to determine, “What 
measurable factors significantly affect the 
quality of student’s classroom work?” In a 
study made previous to the mid-term exam- 
inations he found that the range of time de- 
voted to study extended from zero hours to 
thirty-six hours per week. For a student 
group of 1,306, the mean number of hours 
per week devoted to study was 20.56 with a 
S.D. of 7.74. 

No conspicuous differences are found in 
the above mentioned studies as to the differ- 
ing amounts of time spent by students for 
sleeping, eating, personal care, and recreation. 
Book** found the median number of hours 
per week wasted or lost by students to be 31. 
It may be significant that no such loss of 
time is reported in the other studies. Per- 
haps most students in high school and college 
have an inadequate conception of the value 
of time, or perhaps they believe that time 
spent in thorough enjoyment is never wasted. 

The present study differs from those re- 
viewed in that it is a controlled experiment 
covering a period of sixteen weeks. It seeks 
to determine the effectiveness of making and 
using a budget for the twenty-four hours of 
the day. 


THE PRESENT INVESTIGATION 


The aim of this study is to determine the 
relative value of definite planning of one’s 
time for the twenty-four hours of the day, as 
compared with unplanned use of that time, 
in the achievement of normal school freshmen 
girls. 

This experiment was conducted during two 
successive years at the State Normal School 
at Geneseo, New York. Each year two 
groups of freshmen were equated on the basis 
of sex, age, high school averages, scores ob- 
tained on a psychological examination, and 
on a reading test. In the 1935 experiment 
the 1934 edition of the American Council on 
Education Psychological Examination and 
the New York Reading Comprehension Test 
were used in equating the groups. In the 


“W. F. Book, How to Succeed in College, pp. 22-24. 
an B. Phe aes jo Sind A Survey of Stu 

. B. : o - 
dent Opinion, pp. 13-21. New Haven: Yale University Press, 


% W. F. Book, op. cit., pp. 22-24. 
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1936 experiment the 1936 edition of the 
Teachers College Psychological Examination 
and the Iowa Silent Reading Test were used. 
Thirty-one pairs of students participated in 
the 1935 experiment and forty-six in the 1936 
experiment. 


The physical conditions operating in the 
1935 experiment were “identical”, including 
courses, recitation days, instructors, and 
rooms. The sections in each subject, how- 
ever, were meeting at different hours of the 
same day. The same conditions held for the 
1936 experiment except that twenty-five 
pairs of students recited with one instructor 
in the Department of Psychology, while 
twenty-one pairs recited with the writer. The 


results of these pairings are shown in Tables 
I and II. 


A pre-test covering the work to be offered 
in the child development course was given to 
both the control and experimental groups 
during the second week of the first semester. 
This test was given again at the end of the 
experimental period and comparisons made. 
The results are shown in Table III. 


The control groups were given no sugges- 
tions or directions for the use of the twenty- 
four hours of the day. At the beginning of 
the third week of the semester each member 
of the experimental groups was asked to keep 
a daily record for seven consecutive days of 
just what she did during the twenty-four 
hours of each day. At the end of the seven 
days each individual brought her record for 
the week to the writer for analysis and dis- 
cussion. During the conference, each stu- 
dent made suggestions for improving the ex- 
penditure of time for the week. On a mime- 
ographed daily schedule blank the student, 
with the help of the writer, outlined her 
scheduled hours for recitation, laboratory 
work, and required assemblies with red 
pencil. The hours which were selected for 
study were outlined in blue. Care was taken 
that no one should plan for less than twenty 
hours of study each week. Each individual 
of this group was requested to keep her daily 
schedule blank in a convenient place for 
reference and to follow it throughout the 
semester. Also, at this conference, each in- 
dividual was given a mimeographed page of 
suggestions for study. Various items on thi 
page were selected by the student for discus- 
sion and amplification. 








\ 46 JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 7, No. ; 






TABLE I 


1935 EXPERIMENT: THE MEANS AND STANDARD DEVIATIONS OF THE CONTROL AND EXPERIMENTA, 
GROUPS FOR AGE, HIGH-SCHOOL AVERAGE, 1934 EDITION OF THE AMERICAN COUNCIL oy 
EDUCATION PSYCHOLOGICAL EXAMINATION, AND THE NEW YORK READING COMPREHENSI0\ 
TEST, 1935-1936 EDITION 









Age High School Psychological Reading 
Average Test Test 
Group No. Sex Score Score 
Years Mos. Mean SD Mean SD Mean SD 
1 2 3 4 5 6 7 8 9 10 11 
Control........ 31 F 18 2 80. 30 4.17 204.48 36.36 52.87 5. 96 
Experimental... 31 F 18 2 80. 36 3.82 204.74 35.00 52.68 7.10 










TABLE II 


1936 EXPERIMENT: THE MEANS AND STANDARD DEVIATIONS OF THE CONTROL AND EXPERIMENTAL 
GROUPS FOR AGE, HIGH-SCHOOL AVERAGE, 1936 EDITION OF THB TEACHERS COLLEGE Psycuo- 













jh LOGICAL EXAMINATION, AND THE ADVANCED IOWA SILENT READING TEsT, ForM A (REVIsED) 
v4 Age High School Psychological Reading 

i Average Test Test 

i Group No. Sex Score Score 
A) Years Mos. Mean SD Mean SD Mean SD 
Bs 1 2 3 4 5 6 7 8 9 10 11 
ta | Control... .._-- 46 F 18 2 78.91 4.00 68.50 22.71 141.63 24.05 
oh 

i; Experimental... 46 F 18 0 79.30 3.71 67.70 22.84 141.40 20.8 
3 

‘ 

| TABLE III 

i MEAN AND STANDARD DEVIATIONS OF CONTROL AND EXPERIMENTAL GROUPS FOR PRE-TEST AND 
a! END-TEST IN THE 1935 AND 1936 EXPERIMENTS 

i 
End-Test Pre-Test 
Mean Standard Mean Standard Difference 
4 deviation deviation 
i 1 2 3 4 5 

a 1935 Experiment 

i Control group. ............--- 94.42 11.16 66. 20 13.35 28. 22 

Experimental group-_-_-_____-_- 92.35 9.55 67.02 13.45 25. 33 





Difference in favor of control ae 
a te UE on ic sileiren de pies 2.89 


1936 Experiment 

; Sa a 82. 82 11. 62 56.10 13. 65 26. 72 

Experimental group..._______- 79. 02 12. 89 61. 85 20. 63 17.17 
Difference in favor of control — 


See whe. erinnee ede eeugtene+nennhonchnpeatiterenenndesenadat 9.55 


























] 

Be During the sixteenth week of the semester, At the end of the first semester of each 
| each member of the experimental group was school year the grades for each individual in 
ra given another daily schedule blank to be the control and experimental groups were ob- 
filled in during the following seven days. tained from the registrar. The marking sys- 
The daily schedule blanks filled out during tem used in Geneseo is the usual five point 
the third week and during the sixteenth week system, A, B, C, D, and E. A quality-point 
were summarized under the general headings system is also used. In this system A is given 
of: sleep, recitation and laboratory, eating, four quality-points, B is three, C is two, D is 
study, recreation, church attendance, per- one, and E is zero. 

sonal care, miscellaneous activities, work, A check on the comparative achievement 
i commuting, and wasted time. Tabulations of the groups was again made at the end of 
were made for comparison. the second semester of the school year. In 
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the 1935 experiment only twenty-six pairs of 
students continued in school throughout the 
year. In the 1936 experiment withdrawals 
and eliminations reduced the original number 
of pairs from forty-six to forty-two. 


Although no attempt was made to equate 
the groups in terms of other factors than sex, 
age, high school averages, psychological score, 
and reading score, a questionnaire filled out 
by each student in both groups sought to 
determine whether other factors which were 
not controlled might be sufficiently different 
to account for differences in achievement. 
The factors considered in the questionnaire 
were: local high school and Regent’s diplo- 
mas, year diploma was granted, population of 
high school, number in graduating class, 
number of individuals and amount of time 
consumed in commuting, amount of time 
available for study, interest in teaching, and 
facilities for study. The differences in the 
means and ranges of these tabulations are 
not sufficiently large to indicate that either 
group had an advantage over the other in 
these factors, which were not considered 
when equating the groups. 

At the end of the first semester and again 
at the end of the second semester the mean 
average quality-point grade difference was in 
favor of the control group. These differences 
were small and only one, the average quality- 
point grade for the second semester in the 
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1935 experiment (Table IV, Column 6), ap- 
proaches being significant. It follows that 
freshmen normal school girls did not profit 
from directions and assistance in budgeting 
time as far as scholastic achievement was 
concerned. 


SUMMARY AND CONCLUSION 


This study sought to determine the relative 
value of definite planning for the twenty-four 
hours of the day as compared with unplanned 
use of the same time on the scholastic achieve- 
ment of normal school freshmen girls. This 
study differs from others in its field in that 
it is a controlled experiment with two 
matched groups. It extended over a period 
of one semester of sixteen weeks with an ad- 
ditional check on achievement after the sec- 
ond semester. The experiment was con- 
ducted with freshmen girls at the Geneseo 
State Normal School, Geneseo, New York. 


Two groups of thirty-one members each 
were equated on the basis of sex, age, high 
school averages, psychological test scores, and 
reading test scores. The experiment was re- 
peated in 1936 with two groups of forty-six 
members each. The tabulated results of a 
question list each year showed that the two 
groups were remarkably similar in kind of 
high school diploma, year diploma was 
granted, population of high school, number in 
graduating class, number of individuals and 


TABLE IV 
COMPARISON OF QUALITY-POINT GRADE DIFFERENCES FOR THE 1935 AND 1936 EXPERIMENTS* 


Mean 
Control 


1935 Experiment 
Average quality-points, first 
semester 
Average quality-points, 
second semester 


1936 Experiment 
Average quality-points, first 
semester 


points, 


Average quality- 
second semester 


Experi- 
mental 
Group 


2.30 
2.57 


2.27 
2. 50 


Difference Correlation 
of Means of Matched Error of 
Pairs of of Mean Dtogy 
Controland Difference 
Experimental 
Groups 
4 


Standard Ratio 


. 20 
11 ‘ 2.54 


—.04 =. 08 . 50 


—.13 .01 «. 097 1.34 


*The standard error of the mean difference takes into account the matched pair correlation of the 


control and experimental groups. Thus: 
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amount of time consumed in commuting, 
amount of time required and amount of time 
available for study, interest in teaching, and 
facilities for study. The differences in the 
means and ranges of these tabulations are not 
sufficiently large to indicate that either group 
had an advantage over the other on those 
factors which were not considered when 
equating the groups. 

The members of the experimental groups 
were given group and individual directions 
for budgeting the twenty-four hours of each 
day. Each individual made a daily schedule 
of activities and was urged to follow it. The 
control groups were given no suggestions or 
directions for the use of the twenty-four hours 
of the day. 

Three measures of the effect of the experi- 
mental factor were made. An objective test 
covering the material to be presented in the 
child development course was constructed and 
used as a pre-test and end-test. At the end 
of the first semester and again at the end of 
the second semester, the marks of each indi- 
vidual in both groups were obtained from the 
registrar and translated into quality-point 
grades. The difference between the means on 
the end-test, and of the quality-point grades 
obtained at the end of the first and second 
semesters, was considered the measure of the 
effects of the experimental factor. 

The results of this study indicate that defi- 
nite planning for the use of the twenty-four 
hours of the day does not show any positive 
effect on the scholastic achievement of normal 
school freshmen girls. However, approxi- 
mately 97 per cent of the members in the ex- 
perimental group believed that the assistance 
they had been given in planning for the 
twenty-four hours of the day had been 
decidedly helpful. 

Since it seems entirely logical that the 
budgeting of one’s time for work should make 
for greater achievement in that work it be- 
comes necessary to consider factors which 
might account for the failure to find such 
positive effects in the present study: 

1. The method of budgeting time actually 
performed by the students in the experimen- 
tal groups increased the number of hours of 
study per week four and a half hours or more 
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over the amount of time devoted to study at 
the beginning of each experiment. Through 
acquaintance and association with the mem- 
bers of the experimental group in rooming 
houses and elsewhere the control groups may 
have been stimulated to use their time to 
greater advantage and developed a more 
effective system of budgeting time than that 
used by the experimental groups. 

2. The method used to emphasize the im- 
portance of budgeting time may have had 
negative effects which cancelled or more than 
cancelled any positive effects on achievement. 
One member of the experimental group 
frankly stated that she had an aversion to 
“good advice” even when she knew that it 
was good. 

3. Many of the subjects in the experimen- 
tal and control groups knew each other and it 
is quite possible that individual rivalry. be- 
tween members of the groups stimulated indi- 
viduals of the control groups to greater appli- 
cation to their studies. However, group 
rivalry probably was not a factor, since com- 
parisons between the two groups were not 
made by the instructors of the courses in child 
development. 

It is possible, when college students feel 
their achievements are unsatisfactory, that 
they will seek advice of some mature person 
in whom they have confidence. On such 
occasions, definite help in planning for the 
use of time might be used to advantage by the 
individuals. Until such time, the results of 
this study seem to indicate that it is a waste 
of effort to attempt to give students assistance 
in budgeting their time. 

To determine more conclusively the value 
of budgeting time on scholastic achievement, 
further research should be carried out with 
the following groups: 


1. Larger groups of college students. 

2. Students in junior or senior classes of 

college. 

3. Older students whose education has 
been interrupted for some reason or 
other. : 

. Students in adult educational classes. 

. Students who are dissatisfied with their 
achievements and who seek personal 
advice. 
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AN EVALUATION OF NORMAL SCHOOL SORORITIES* 









Bess E. JOHNSON 


State Normal School, 
Geneseo, New York 







Sororities have frequently been the object 
of discussion and criticism. These discus- 
sions and criticisms usually center around the 
social advantages and disadvantages of mem- 
bership in such organizations, the comparative 
scholastic achievement of sorority and non- 
sorority members, and housing with its 
attendant financial obligations. 


The statements concerning the social ad- 
vantages and disadvantages are both vehe- 
ment and contradictory. In an editorial com- 
ment W. H. Cowley’ quotes C. C. Little of 
the University of Michigan as saying that 
“the greatest source of irritation is the dis- 
parity between the unfulfilled potentialities of 
the fraternities, the social and intellectual 
smugness of their members, the puniness of 
their constructive contributions, and the un- 
reality of their ideals”. Cowley further quotes 
the belief of President Hopkins of Dartmouth 
that “real education consists of the impact of 
youthful mind on youthful mind—very often 
this impact is more natural and genuinely 
available in fraternity groups than anywhere 
else in the college organization”’. 


Further evidence on the disparity of opin- 
ion is seen in the statements of Angell, Duerr, 
and Grant. Angell is of the opinion that “the 
opportunities which house groups afford for 
intimate companionship is perhaps their great- 
est benefit”. He further states that frater- 
nities furnish “good training for active par- 
ticipation in larger society,”* and “they make 
possible intimate circles of friends with sat- 
isfying social integration.’”* Angell’s study 
of undergraduate adjustment showed that fra- 
ternity members were twice as likely to be 
found among the socially well integrated as 
were non-fraternity members. He concludes 
that “the case studies of fraternity and soror- 
ities pretty generally agreed that these or- 


* Field Study te, * Colorado State College of Education, 
Greeley, Colorado, 
Ties 


‘W. H. sel the Fraternity,” Journal of 
Higher Education, V (May, 1934), 281-84. 


*R. C. ee" i*- Campus, p. 71. New York: Appleton 


and Company, 1 
a p. 71. 
Angell, A Study ie petopyatode ae Pp. 
ne” ‘Chiceost University of Chicago Press, 
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ganizations promote social facility, the ability 
to understand others, self knowledge, and self 


control; but they are likely not to increase 
tolerance or stimulate one to think through 
one’s own philosophy of life.’ 


Duerr® believes that fraternities assist their 
members in attaining “true self realization” 
and states that “the fraternity is potentially 
the greatest single moral and—in a non- 
religious sense—spiritual force in college 
life.”” Grant quotes one university president 
as saying that the social fraternity was “the 
greatest organized source of misbehavior on 
the campus.”* 


Some of the opinions that have been quoted 
apply particularly to men’s organizations. 
Perhaps it is not unfair to assume that these 
statements might also apply to women’s or- 
ganizations. The divergence of opinion that 
has been quoted leads the writer to believe 
that the desirability of fraternal organizations 
varies with the particular institution in which 
they are found. These organizations may 
stimulate and guide individuals to their 
greatest achievement or they may misguide 
and discourage them. 


In a study made by A. G. Heyhoe® at 
Doane College, it was found that the stu- 
dents considered the personal contacts of 
sorority and fraternity membership as eighth 
in importance of fourteen factors in character 
building. Angell’® found that students be- 
lieved membership in a sorority gave them 
better chances of attaining success in extra- 
curricular activities. Chase’ believes that 
students in sororities may be urged into extra- 
curricular activities in which they have no 

5 Ibid., p. 118. 


*A. E. Duerr, “Fraternity and Scholarship,” American 
Scholar, 1 (January, 1932), 17-20. 

+ p. 18. 

. Grant, “‘The Social Fraternity,” School and Society, 

XXeiIt’ (February 14, 1931), 229-33. 

*A. G. Heyhoe, “Character Buildin ing Factors at Doane Col- 
lege,”’ Religious Education, XXIV (May, 1929), 455-57. 

R. C. Angell, The Campus, pp. 67-86. 

1H. W. Chase, ‘Fraternities under Present Day Condition, 
with Discussion,” Transactions and Proceedings, Vol. XXX, 


60-84. Madison, Wisconsin: National Association of State 
niversities, 1932. 
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interest or competence, in which case it would 
be more injurious than beneficial to them. 


Non-sorority groups frequently accuse 
sorority groups of being snobbish and of 
thinking themselves superior. There seems to 
be no evidence that sorority members do con- 
sider themselves superior or snobbish. This 
does not eliminate the fact that each year a 
considerable number of individuals are seri- 
ously disappointed because they are not 
selected for membership by some sorority 
group and, as a consequence, harbor feelings 
of resentment and inferiority. Angell’? found 
that approximately half of the individuals 
who were not members of these organizations 
would like to become members, if circum- 
stances permitted. 


The scholastic attainments of sorority and 
non-sorority groups have been studied in a 
number of colleges and universities, with con- 
flicting results. At the University of Wiscon- 
sin, Byrns’* found that the average scholastic 
achievement for sorority members was consid- 
erable higher than for non-sorority women. 
She further found that students living in 
dormitories maintained by the university re- 
ceived lower grades than did students living 
in houses maintained by student groups. 
Worcester’** also found that the average 
standing of sorority members was higher than 
that of non-sorority members. 


A study of 2,817 students made by Eurich 
at the University of Maine leads to the con- 
clusion that “the poorer student has better 
chances if he does not belong to a fraternity, 
while the better student appears to be able to 
do superior work with a fraternity environ- 
ment”, while the “fraternity environment does 
not affect scholastic achievement of average 
college students.”*® A conclusion of the study 
made by J. B. Johnston at the University of 
Minnesota was “that fraternities and soror- 
ities do not contribute to the improvement of 
scholarship.’”** In a subsequent study made 
by Johnston*’ it was found that in scholar- 


™R. C. Angell, A Study of Undergraduate Adjustment, pp. 


112-18. 
* Ruth Byrns, ‘Concern College Grades,”’ School and 
Society, XD (May 17, 1938). 684-86. 


™D. A. Worcester, ‘Fraternities and Scholarship,”’ School 
and Society, XVIII (August, 1923), 147-48. 

%* A. S. Eurich, “The Relation of Achievement between Col- 
lege Fraterni and Non-Fraternity Groups,” School and 
Society, XXVI (November 12, 1927), 626. 

yj. B. pe. “Tests of Ability before College Entrance,” 
oe and Society, XV (April, 1922), 345—53. 

. B. Johnston, “Predicting Success or Failure in College 
at the Be} of Entrance,” School and Society, XX (July, 
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ship the sorority pledges did a little better 
than the average woman student. 


Although this evidence is conflicting, it 
seems fair to conclude that sororities cannot 
be said to interfere sufficiently with scholastic 
achievement to justify their elimination on 
this basis. The scholarship standings of 
sororities have been materially raised in a 
number of institutions where special attention 
has been given this matter. At the Univer- 
sity of Wisconsin’* a certain scholarship aver- 
age is required before an_ individual js 
initiated. 

Assuming the financial responsibility of a 
chapter house necessitates a degree of ma- 
turity, judgment, and discrimination which 
may be beyond the ability of college students. 
When a sorority undertakes to maintain a 
chapter house, the financial status of the 
prospective members assumes new signifi- 
cance. No sorority can afford to elect mem- 
bers who cannot meet the financial obligations 
involved. At the University of Montana, 
Speer’® found that nearly all of the nine fra- 
ternities and ten sororities on the campus had 
difficulty in maintaining a membership which 
was financially able to maintain the club 
house. At the beginning of each year the 
difficulty of filling the house with new mem- 
bers always had to be met. This tended to 
make the financial backing of an individual 
loom large in her selection for initiation, and 
less consideration was given to the election of 
congenial members. As a result, the sorority 
members appeared aloof and unsociable to the 
non-sorority members. 

A summary of the previous studies leads to 
the following conclusions: (1) That sororities 
are not universally accepted as desirable or- 
ganizations is evidenced by the divergence of 
opinions expressed in the literature about 
them. (2) Where studies of the comparative 
scholastic achievements of the sorority and 
non-sorority groups have been made there is 
conflicting evidence. This evidence is not 
sufficient to warrant the conclusion that these 
organizations should be abolished. (3) When 
considering the effect of the sorority in devel- 
oping desirable social attributes, diametrically 
opposed attitudes are expressed by those who 
are in a position to know. (4) The available 
evidence concerning the financing of sorority 
houses is limited; however, college students 


‘° ‘eT Rohese , eo and Their Money M 
ters,” School end Societe, RXXVI (October, 1932), Si7-74., 
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with a minimum of experience in financial 
matters should use discretion and foresight 
before assuming obligations which may in- 
volve them in a great many difficulties. 


THE PRESENT INVESTIGATION 


[he purpose of the present study is four- 
fold: 

1. To determine the relative scholastic 
achievement of sorority and non-sorority 
members, as a basis for determining whether 
sorority organizations, as they exist at Gen- 
eseo, New York, interfere with achievement. 

2. To ascertain the judgment of sorority 
and non-sorority alumnae regarding the value 
of such organizations to the student body. 

3. To ascertain the judgment of the soror- 
ity alumnae as to the advisability of attempt- 
ing to establish and maintain sorority chapter 
houses. 

4. To determine the relative activity of 
sorority and non-sorority members in such 
campus organizations as are open to all 
students. 

From the graduating classes of 1933, 1934, 
1935, and 1936, 100 sorority alumnae were 
paired with 100 non-sorority alumnae on the 
basis of age, date of entrance to Geneseo Nor- 
mal School, and scores on the Ohio State Uni- 
versity Psychological Test. The results of 
this pairing (Table I) show a small difference 
in favor of the non-sorority groups. The 
school grades for all subjects pursued, as ob- 
tained from the registrar’s records, were 
translated into quality-points and averages 
were calculated. The quality-point system in 
use at this time weights each A obtained in a 
course as three, each B as two, and each C 
as one point. Thus a two credit-hour course 
for a semester with a grade of A has six 
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quality-points, a grade of B has four, and a 
grade of C has two. Ninety-six semester 
hours of work are required for graduation. 
The average grade of each individual was 
found by dividing the total number of 
quality-points by the total number of semes- 
ter hours of credit. Correlations were com- 
puted by the Pearson product-moment 
method. The correlation between psycholog- 
ical scores and average grades for the sorority 
groups was found to be .39. For the non- 
sorority groups it was found to be .24. In 
Table I the quality-point grade averages of 
the two groups were seen to be practically the 
same. The difference of .04 is in favor of the 
non-sorority groups and the ratio of this dif- 
ference to the standard error of the difference 
is .og. Since these results do not show a sig- 
nificant mean difference in grade achievement 
it would seem that sorority membership 
versus non-sorority membership has no dif- 
ferentiating effect on the average scholastic 
achievement of the groups compared. 


In order to get information for the other 
three problems of the present study, letters 
enclosing check-lists were sent to 200 sorority 
and 200 non-sorority alumnae of the classes 
of 1933, 1934, 1935, and 1936. A total of 
135 check lists, or 67.5 per cent, were re- 
turned by the non-sorority alumnae, and 151 
or 75.5 per cent, by the sorority alumnae. 
Twenty-six per cent of the sorority alumnae 
had been members for one year, 26 per cent 
for two years, and 48 per cent for three years. 


The responses to the check lists will be con- 
sidered under the following heads: election 
to membership, finances, social advantages, 
scholarship, more sororities, sorority houses, 
and school activities. 


ENTRANCE, AGE, PSYCHOLOGICAL SCORES, AND AVERAGE QUALITY-POINT GRADES 


; Age 
in years and months 
Non- 
Number Sorority Sorority 
100 100 
a 18—2 18—1.9 
Jone sececs seccccce 17—8 17—8 
> eae 18—8 18—8 
og hiaiithiaeba chika hms 0—6 0—6 
sditive capennca 18—3 18—2.9 
ee ee ee 0—10.4 0—10.2 
Ob. sda vo tena 0—1.0 0—1.0 





0.8. U. Quality-Point 
Psychological Scores Grade Average 
Non- Non- 
Sorority Sorority Sorority Sorority 
100 100 100 100 
133. 50 135. 50 1.36 1.37 
114. 61 116. 88 1.17 1.19 
153. 50 154. 38 1, 60 1. 64 
19. 44 18.75 .22 . 22 
136. 70 137.90 1.41 1.45 
26.15 25. 95 . 80 . 34 
2. 62 2.59 . 03 . 08 
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A. Election to Membership 


The results of the returned check lists show 
that approximately 25 per cent had previous 
family connection which might tend to make 
these individuals “legacies”, but only 8 per 
cent had been recommended by some former 
graduate. Approximately 84 per cent believed 
that they had been invited to membership on 
their own merits. Only 1o per cent believed 
their invitations were due to the fact that 
they were living at houses with other girls 
who were sorority members. In the non- 
sorority groups 86.6 per cent stated that they 
had not been rushed by any sorority, and 
76.3 per cent stated that they were not 
greatly disappointed. To be rushed by a 
sorority and not given an invitation to join 
the sorority was considered more embarrass- 
ing than not to be rushed at all by 91.6 per 
cent of the non-sorority group who responded. 

Two other factors which have some signifi- 
cance in predicting which girls will ultimately 
become sorority members are: urban or rural 
residence while in the elementary school and 
high school, and commuting. Of the sorority 
alumnae 26.5 per cent had lived in the coun- 
try while attending the elementary school, 
and this number was decreased to 22.6 per 
cent who had been rural residents while at- 
tending high school. In the non-sorority 
group 51.8 per cent had lived in the country 
during elementary school years, and 44.5 per 
cent had lived in the country while attending 
high school. The difference in rural residence 
during elementary school years is 25.3 per 
cent while during high school it is 21.9 per 
cent. Both these differences are significant 
since the ratio of the difference to the stand- 
ard error of the difference is 4.5 and 4.0 re- 
spectively. This is in accord with the results 
obtained by Angell.?° In the sorority group 
8.6 per cent were commuters, while in the 
non-sorority group 14.1 per cent were com- 
muters. The difference, 5.5 per cent, is not 
statistically significant since the ratio of this 
difference to the standard error of the differ- 
ence is 1.48. 


B. Finances 


Sororities are frequently criticized for ex- 
cessive financial demands made upon the 
members. At Geneseo each new member pays 
an initiation fee of $5.00. Other assessments 
made by each sorority probably do not exceed 
$15.00 per year. Approximately 40 per cent 

*R. C. Angell, The Campus, pp. 67-68. 
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of the non-sorority groups said that the addj- 
tional financial demands incurred with mem- 
bership deterred them from joining a sorority, 
In the sorority groups 96 per cent did not find 
the financial obligations a serious factor, and 
go per cent believed the experience of belong- 
ing to a sorority was sufficiently valuable to 
recompense for the added expense. 


C. Social Advantages 


A small majority of the non-sorority alum- 
nae, 59 per cent, did not believe that sorority 
membership was an advantage in developing 
desirable social traits and 65.9 per cent con- 
tended that it made the members snobbish 
and aloof. From the sorority alumnae the 
returns showed that 90 per cent believed the 
sorority affiliations had aided in developing 
worthwhile social traits, and 24 per cent con- 
sidered themselves snobbish and aloof. Sev- 
enty-six per cent of the sorority alumnae re- 
ported that their best friends were sorority 
members, while 24.6 per cent of the non- 
sorority alumnae had found their best friends 
among the sorority members. Ninety-six per 
cent of the sorority alumnae believed their 
moral standards to be as high as the stand- 
ards of non-sorority members; however, only 
58 per cent of the non-sorority group were of 
this opinion. As a supplement to their train- 
ing for teaching, sorority membership was 
corsidered desirable by 91 per cent of its 
alumnae, but only 40 per cent of the non- 
sorority alumnae considered it valuable. 


D. Scholarship 


Another frequent criticism of the sororities 
is directed toward scholarship. Instructors 
and others have contended that sorority mem- 
bers devote so much time to social activities 
that classroom work is neglected. In the non- 
sorority group 89.6 per cent believed that 
membership in a sorority does not interfere 
with scholastic achievement, and the sorority 
group was almost unanimously (99.3 per 
cent) of the same opinion. The results of 
the first part of this study confirm these opin- 
ions since the average scholastic means for 
the two groups were found to be practically 
identical. 


E. More Sororities? 


If sororities provide desirable opportunities 
for social growth and development, there 
should be enough sororities on a campus so 
that these opportunities will be available for 
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ali. At Geneseo the registration is limited to 
so students and approximately 550 of these 
are girls. Of the sorority alumnae replying 
to the questionnaire 80.9 per cent were not in 
favor of adding new organizations to the four 
which are in existence on the campus, but 
88.1 per cent of the non-sorority group be- 
lieved it advisable that new ones be added. 


F. Sorority Houses 

No provision is made for sorority rooms in 
the new building which is under construction 
at Geneseo. In the old building each sorority 
had been assigned a room which its members 
furnished and used for business and recre- 
ational purposes. At the present time each 
sorority has selected a residence as temporary 
head-quarters. For this privilege each group 
is under obligation to the land-lord to pay 
rent for a specified number of rooms. The 
returns of the check list showed a decided un- 
certainty about the advisability of maintain- 
ing or building a chapter house. In the soror- 
ity group 55.7 per cent believed it advisable, 
while in the non-sorority group only 28.9 per 
cent were of this opinion. As many as 57.6 
per cent of the sorority alumnae signified a 
willingness to aid their sisters in maintaining 
a sorority house. 


G. Student Activities 


A check list consisting of thirty-one activ- 
ities open to all students was sent to the 400 
alumnae. This list of activities consisted of 
five student government and class organiza- 
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tions, one honorary journalistic fraternity, one 
honorary scholastic fraternity, ten clubs, three 
musical organizations, nine committees in 
charge of major school activities, and two 
school publications. The returns of the check 
list showed that the average sorority member 
had been engaged in five activities, whereas 
the average non-sorority member had been 
engaged in three. Thirty and four tenths per 
cent of the sorority groups had been presi- 
dents and vice-presidents whereas 10.3 per 
cent of the non-sorority group had engaged in 
these activities (Table II). As chairman of 
various committees, the participation of the 
sorority group in activities was 27 per cent 
above that of the non-sorority group. As for 
participation in the school publications, 19.9 
per cent of the non-sorority had held major 
offices, whereas 15.7 per cent of the sorority 
group had held them. 

In order to obtain distinction, sororities 
may urge their members into activities where 
there is neither interest nor ability. It is true 
that sororities are better organized to support 
members who seek office in school activities 
but the writer is not aware of any serious in- 
justice having been done a sorority member 
by over stimulation in school activities. 
There are, no doubt, many non-sorority mem- 
bers who would be equally competent but who 
seem to lack the initiative and the support for 
election to offices in school activities. Besides 
the growth which comes from the participa- 
tion in the activities open to all students, the 
sorority members have the opportunities for 


TABLE II 


PERCENTAGE OF SORORITY AND NON-SoRORITY ALUMNAE ENGAGING IN 
DIFFERENT STUDENT ACTIVITIES 


Sorority Non-Sorority 
Alumnae N=151 N= 135 
my Per- Standard Per- Standard Differ- Standard Ratio of 
Activity centage error centage error ence error of D to ¢» 
oa difference 
. President__....._- 21.1 «3.32 9.6 «2.54 11.5 «4.18 2.75 
2. Vice-President__ _ _ 9.3 +2.36 Z « .71 8.6 2.47 3.44 
3. Secretary......__. 11.2 «2.57 1.5 «1.05 9.7 #2.77 3.50 
4. Treasurer________- 2.6 ee 2.6 *1.29 1.99 
5. Student Council... 16.5 =3.02 4.4 #1.77 12.1 «3.50 3.46 
6. Executive Council 23.1 «3.43 3.7 «1.63 19.4 «3. 80 5.10 
7. Other Officer-____- 29.1 «3.70 7.7 «3.29 11.4 4.95 2.23 
8. Chairman _______- 34.4 +3. 87 7.4 #2.25 27.0 «4.47 6. 04 
- ae -- rs | 3.7 #1.68 —1.1 2.31 .47 
- Business Manager - ; ede i fan 1.9 #1.11 1.71 
11. Head of a Depart- 
ment of a publi- 
TT 11.2 «2.57 3.3 «2.93 — 2.1 +3. 89 . 54 
12. Managing Editor._.................__- 2.9 #“1.44 —2.9 #1. 44 2. 00 
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growth which come from planning and direct- 
ing their own organizations. At Geneseo a 
non-selective social organization on the cam- 
pus is open to all non-sorority people. Only 
about 29 per cent of those who returned check 
lists had belonged to this organization. 

A summary of the percentage of activity 
for the two groups is reported in Table II. 
The differences of the percentages were in 
favor of the sorority group except items nine, 
eleven, and twelve. Items two, three, five, 
six, and eight all show significant differences 
between the groups. In items one, four, 
seven, and ten the differences are in favor of 
the sorority groups, but these differences are 
not reliable. In items nine, eleven, and 
twelve, where the differences are in favor of 
the non-sorority groups, the differences are 
not reliable. 


SUMMARY 


A survey of the literature in the field re- 
veals a wide divergence of opinion about the 
value of sororities. Studies of scholastic 
achievement of sorority and non-sorority 
groups do not show evidence sufficiently con- 
vincing to warrant the conclusion that such 
organizations interfere with achievement. 
The results of this study indicate that the 
sorority women are approximately twice as 
active in student organizations as the non- 
sorority women. This investigation has fur- 
ther shown that a girl’s chances of being 
elected to membership in a sorority are con- 
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ditioned by the fact that relatives or friends 
have previously belonged to such organiza. 
tions. Other factors conditioning a girl’s 
chances of election are rural residence, com. 
muting, and financial status of her family. 
Sorority women believe their organizations to 
be a valuable aid in developing desirable 
social traits, while only about half of the non- 
sorority women believe these organizations to 
be valuable in this respect. The non-sorority 
women think that there should be enough 
sororities on the campus in order that many 
more students may have the opportunity to 
belong to such organizations. The sorority 
women believe that no new organizations 
should be started. Non-sorority women very 
strongly object to being rushed by a sorority 
and then not being given an invitation to join 
that sorority. Approximately half of the 
sorority alumnae would like to see their sisters 
living in chapter houses and would be willing 
to render financial assistance. 

Further research in this field is necessary: 
(1) to determine the comparative scholastic 
achievement of sorority and non-sorority 
groups where the sorority is living in a chap- 
ter house; (2) to determine the activity par- 
ticipation of students who live in groups of 
ten or more; (3) to determine the scholastic 
achievement of those individuals who are 
most active in student organizations; and 
(4) to determine the comparative teaching 
success of sorority and non-sorority women. 
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THE EFFECT OF WRITTEN EXAMINATIONS ON LEARNING 
AND ON THE RETENTION OF LEARNING* 


Bess E. JOHNSON 


State Normal School, 
Geneseo, New York 


Tradition seems responsible for the almost 
unanimous agreement that some form of ex- 
amination should be used in the schoolroom. 
Examinations are used for purposes of grad- 
ing, classification, promotion, and in fewer 
cases for determining the efficiency of teach- 
ing techniques. Although the tests in sub- 
ject matter fields are not generally used in 
the primary grades, they are almost universal 
in the intermediate grades, junior and senior 
high schools, colleges, and graduate schools. 
In the last two decades the subjective type 
of examination has pretty largely given way 
to the more objective examination, because 
the latter has been proved more reliable. 
These tests and examinations aim to deter- 
mine not only the student’s acquisition of 
factual information and fundamental prin- 
ciples, but also his ability to retain informa- 
tion. At the present time probably very few 
classroom examinations succeed in measuring 
the student’s ability to apply his knowledge 
in the interpretation of life situations and 
problems. 


1. REVIEW OF PREVIOUS STUDIES 


Several experiments have shown that where 
examinations have been used as incentives 
they have increased the acquisition of 
knowledge. To determine the effectiveness 
of the knowledge of marks on the subsequent 
achievement of college students, Fay' equated 
two groups in terms of their percentile ratings 
on the American Council on Education Psy- 
chological Examination and their scores ob- 
tained on a test covering the material pre- 
sented in his psychology class. The mem- 
bers of the experimental group were informed 
that they would receive letter grade marks 
every four weeks, while no announcement 
concerning marks was made to the control 
group. The results of his study showed that 
the students in the experimental group who 

* Field Study No. 3, Colorado State College of Education, 

Ped. Fay, rithe Bilt of the Know of Marks on the 
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t t of St ts,”’ Journal 
Edacellonal Pascholocs XXVIIT (October, 1987), S034. 


achieved B or C on the first test achieved 
less on the final test, while in the control 
group those who were rated B gained slightly 
on the final test and those who rated C lost. 
Those securing an A on the first test worked 
to retain their position while C students at- 
tempted to improve their standing. Fay con- 
cludes, “In spite of the obvious unreliability 
of the results A’s do better if they know their 
marks, B’s do less if they know their marks, 
C’s achieve more on the final if they know 
their marks. A closed marking system con- 
fuses students. Students of low intelligence 
particularly need to know their marks as an 
incentive.’”” 

Deputy® sought to determine the influence 
of frequent knowledge of success on the test 
scores of three groups of freshman students 
studying philosophy. One group was given 
a ten minute test each time the class met 
(twice a week), another group was given a 
twenty minute test once a week, and the third 
group had daily oral reviews but no tests. 
The results of learning under these conditions 
were measured by informal objective tests. 
“The section which had written work each 
time it met, did significantly better than the 
one which had written work once each week 
or the one which had only the recitation 
work. The section which had written work 
only once a week, was not enough better, to 
be significant, than the one without any writ- 
ten work.”* When the control group became 
the experimental group, it did not show any 
marked signs of improvement over the pre- 
vious experimental groups which were now 
acting as control groups and had no written 
tests until the end of the semester. This lack 
of improvement may have been due to the 
fact that the original experimental group had 
acquired study habits which were being em- 
ployed to their advantage. However, Deputy 
is inclined to believe that the lack of im- 

2 Ibid., p. S54 
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* Ibid., p. 331. 
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provement was “due to the fact that their 
written work came during the second half of 
the semester, after a half semester of only 
oral recitation.’”® 

A study of the value of the final examina- 
tion was made by T. H. Schutte® at the 
Oregon Normal, Monmouth, Oregon, using 
one hundred students in his Introduction to 
Education courses. Control and experimen- 
tal groups were selected on the basis of the 
comparative achievement on the Otis Self- 
Administering Test after those who were over 
24 and under 18 years of age were eliminated. 
The experiment extended over a period of 
twelve weeks with the classes reciting three 
hours each week. Both groups were given the 
same lectures and assigned material. The 
examination group was told once or twice a 
week that the material should be mastered 
since it might be called for in the final test, 
while the control group was urged to do thor- 
ough work since there would be no provision 
for review and no final examination was to be 
given. Brief tests dealing with the lectures, 
class discussions, and assigned readings were 
given to both groups each week. These tests 
were objectively scored, as was the final ex- 
amination which consisted of 180 items. The 
results showed a correlation of .66 + .04 
between the mean scores on the short tests 
and final examination for the non-examination 
group and a correlation of .79 + .025 for the 
examination group. Cramming for the final 
examination was precluded by exacting as- 
signments for both groups the week before the 
final examination. Schutte concludes, “Doubt- 
less the examination group learned with a 
more definite intention of retaining the ma- 
terial of the course than did the other 
group.”” It is possible that the retention to 
which Schutte refers did not extend beyond 
the period of the final examination. 

In a controlled experiment Scott* at- 
tempted to evaluate the use of the final ex- 
amination as an instructional device with a 
total of 805 junior and senior high school 
students divided into 37 instructional groups. 
Control and experimental groups were 
equated on the basis of intelligence test 
scores. Standard tests and informal objec- 

8 Ibid., p. 333. 

in Final Exams?” Jour- 
nal of Research, XI October 1925), 204-13. 

* Ibid., p. 211. 

51. O. Scott, “Stimula Lossning Th the Use of the 
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tive tests were given in each of the subject 
matter fields at the beginning and again at 
the end of the experimental period, the schoo! 
year. Scott concludes from his study “that 
examinations as instructional devices caused 
significant differences in the achievement of 
students in all of the classes studied with the 
exception of Geography 7, English 10, and 
English 11, as this experiment revealed a 
completely reliable difference in favor of 
using both standard tests and teacher-made 
tests as aids to learning.”® He believes that 
“the use of final tests is worthwhile and that 
the time consumed by the teachers and stu- 
dents in the preparation and taking of such 
examinations is well spent.*® 

This study made no attempt to determine 
whether the permanence of the greater 
achievement of the examination group would 
continue after an interval of several months. 

Jersild"* used two sections studying psy- 
chology in equivalent group experiments. In 
each experiment one group was given a pre- 
examination before any time had been given 
for study of assignments. The other group 
which served as a control was not pre- 
examined. Both groups were given the same 
final examination. When the examination 
consisted of true-false statements, the pre- 
examined group made a lower score than did 
the control group which had not had the test. 
When multiple-choice and essay examinations 
were used, the results showed a 5 to 20 per 
cent higher score for the pre-examined group. 

Kulp’s’? study with graduate students in 
educational sociology at Teachers College, 
Columbia University, showed that older stu- 
dents were stimulated in learning by the use 
of tests. As a result of ten minute tests given 
weekly the “low” half of the group gained 
significantly over the “high” half of the class 
which had no weekly tests. 

White'* found that his students in general 
psychology who were urged to study their 
returned weekly test papers for a final exam- 
ination which would include the same ques- 
tions did 51.15 per cent better than the stu- 
dents who used their returned test papers at 
their own discretion. 

* Ibid., p. 34. 
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These studies by Fay, Deputy, Schutte, 
Scott, Jersild, Kulp, and White have shown 
rather conclusively that examinations stimu- 
late the immediate achievement of students 
of various age groups. However, they do 
not offer evidence to show whether or not this 
greater achievement will continue when the 
students are tested for delayed recall or per- 
manence of learning. 

Sister Felicita Gable used a group of ninth 
grade biology students “to determine the 
effect on pupil achievement of a system of 
anticipated daily check testing as compared 
with frequent unannounced unit tests.”** 
Two groups were equated for mental age, 
initial knowledge of biology, and_socio- 
economic status. The findings of the inves- 
tigation were based on ninety-nine daily 
checks, one hundred unannounced, and 
seventy-five announced tests. The results 
obtained showed that the students who had 
announced and unannounced unit tests given 
at longer intervals did better than did those 
who had daily tests, and that after three 
months they continued to hold their lead 
with differences decreasing. 

In an attempt to determine the influence 
of weekly and monthly tests on learning and 
retention Noel Keys,’® at the University of 
California, matched two groups each of 143 
students in educational psychology on sex 
and pre-test scores. The experimental period 
was divided into three, four-week periods. 
During the first period the control group was 
given a “lump assignment” and the date of 
the mid-term test was announced, while the 
experimental group was assigned weekly 
readings by topics and chapters, and the 
dates for weekly tests were specified. For 
the second period the same weekly assign- 
ments were given to both groups, but the dif- 
ference in testing procedure continued as in 
the first period. For the third period weekly 
assignments were given only to the experi- 
mental group, but both groups received one 
monthly test. “The control section met 
after the experimental and probably enjoyed 
a certain advantage on that account. To off- 
set this as far as practicable, one-fourth of 
the lectures and one-half of the total tests 
and examinations were scheduled to be given 
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first to the control. A further possible ad- 
vantage to the control group lay in the fact 
that 14 per cent of its members were grad- 
uate students, as against 6 per cent of the 
experimental.’’*® 

The total mean gain of the weekly-tested 
group over the monthly-tested group on the 
periodic test material was 26.5 + 4.1. The 
difference in gains in scores on an unan- 
nounced final examination after a lapse of 
five to fourteen weeks was 4.2 + 1.4 in favor 
of the weekly-tested group. However, on the 
regular end-term examination where all had 
equal opportunity to cram the scores were 
the same for both groups. 

Pease’’ conducted an experiment with 204 
high school and college students to determine 
the influence of cramming on achievement. 
The control group was given an objective test 
of 100 items without warning, while the ex- 
perimental group was dismissed and urged to 
spend at least an hour in cramming for the 
same test, which would be given at the next 
class meeting. The results showed that the 
group which crammed gained 11.1 points 
over the control group, but six weeks later 
when the identical test was given to both 
groups, the gain was reduced to 6.3 points. 
Pease repeated the experiment with two sub- 
sequent groups of students. One cramming 
group of 94 students with a mean intelligence 
of 139.1 gained 8.2 + 2.76 and six weeks 
later their gain was reduced to 4.8 + 2.25. 
Another cramming group of 82 with a mean 
intelligence score of 163.4 gained 17.07 + 
3.32, but twelve weeks later their gain was 
reduced to 2.7 + 3.17. These results seem 
to indicate that the greater achievement 
which is induced by cramming is of a tem- 
porary nature. 

Hertzberg, Heilman, and Leunberger’® 
found that the experimental group, with 
which objective tests were used as a teaching 
device, did work superior to the control group 
on the immediate results of testing. How- 
ever, the group having objective tests did not 
do significantly better when tested for de- 
layed recall. 

With older students examinations are 
almost universally used in the schoolroom, 

8 Ibid., p. 429. 


7G. R. Pease, “Should Teachers Give Warning of Tests and 
Examination?” Journal of Educational Psychology, XXI 
(Aged, 1930), 273-77. 
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but the subjective forms are giving way to 
the greater reliability of the objective exam- 
inations. These examinations attempt to 
measure not only the amount of factual in- 
formation acquired but also the individual’s 
retention of this information. 


The effects of examinations as incentives 
to achievement have been studied by Fay, 
Deputy, Schutte, Scott, Jersild, Kulp, and 
White. In each case the experiments have 
shown conclusively that examinations stimu- 
late the immediate achievement of students 
of various age groups. 


In addition to the stimulating effect of ex- 
aminations, the permanence of the learning 
has been studied by Gable, Keys, Pease, and 
Hertzberg, Heilman, and Leunberger. Gable 
found that examinations increased the 
achievement, but that after three months the 
superiority of the group having examinations 
had decreased. Keys found the superiority 
of a weekly-tested group over a monthly- 
tested group of students also decreased after 
a period of five to fourteen weeks. Pease 
found that the superiority of students who 
had opportunities for cramming for examina- 
tions was greatly reduced after periods of six 
and twelve weeks. Hertzberg, Heilman, and 
Leunberger found that students having objec- 
tive tests did not do significantly better when 
tested for delayed recall. 


2. THE PRESENT INVESTIGATION 


This study was undertaken for the purpose 
of determining whether informal objective 
examinations as used in a class with fresh- 
men girls at the State Normal School, Gen- 
eseo, New York, sufficiently stimulated the 
immediate achievement and the delayed re- 
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tention of learning to justify continuation for 
such purposes. 

Fifty-five pairs of girls were equated on 
the basis of age, high school averages, and 
scores obtained on the 1936 Edition of the 
Teachers College Psychological Examination, 
Form A. These students were divided into 
two experimental groups and two control 
groups. One control group and its paired 
experimental group consisting of twenty-seven 
members worked with Dr. H. D. Behrens, 
and the other groups consisting of twenty- 
eight members worked with the writer. The 
results of the pairing are shown in Table |. 


At the beginning of the experimental 
period, on October 5, 1937, all freshmen stu- 
dents were given a pre-test covering the work 
to be offered in the course in child develop- 
ment. This test consisted of 100 true-false 
statements, 80 matching items, 68 multiple- 
choice statements, and 35 completion state- 
ments. All parts of the test were based on 
the course in child development which had 
been offered by the writer the previous year 
to similar groups of students. To avoid un- 
due emotional reactions during the test, the 
students were told that they were not ex- 
pected to be able to answer all the questions 
correctly, but that they should attempt all 
questions. They were also told that their 
marks on this test would not influence their 
final grades, but that the results of the test 
would enable the instructors to determine the 
content of the course and the emphasis to be 
placed on various items. 

In order that the courses by the two in- 
structors might be as nearly alike as possible, 
mimeographed outlines of each of the seven 
units of the course were given each student. 
Each unit consisted of: (A) introduction to 


TABLE I 


PAIRING OF CONTROL AND EXPERIMENTAL GROUPS IN TERMS OF 


I1GH SCHOOL AVERAGE, AND 


an xan 
1936 EDITION OF THE TEACHERS COLLEGE PSYCHOLOGICAL INATION, FoRM A 


Control group 


High 

School 

Yrs. Mo Average 
1 2 3 
eg ceed 18— 1 79.10 
a 0—11 4.20 
I i a nas ws dips + 1.48 mo. * .57 
Median.._________- 17—11 78.78 
Reo inn cen ee's 18— 8 81. 37 
ee og ai mae 17— 4 75. 57 
ho ee — 8 2.90 


Experimental group 


Psycho- High Psycho- 
logical Age School logical 
score Yrs. Mo. Average score 

4 5 6 7 

168. 73 18— 3 79.20 168. 64 
42.01 1— 0 3.99 47. 80 
+ 5. 66 + 1.62 mo. + .64 +6. 44 
169.17 18— 2 78. 60 167.08 
193. 13 18— 9 82.31 192. 45 
143. 44 17— 7 75. 55 151. 25 
24. 84 —T7 3.38 20. 60 
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the topic being studied, (B) statement of the 
problem, (C) topical outline, (D) questions 
to guide the study of the problem, (E) obser- 
vation assignments and projects, (F) read- 
ings, and (G) summary and application. 
Written assignments based on student ob- 
servations in the School of Practice, or 
projects, were required almost every week of 
all students. These took the form of essays 
which were corrected by the instructors and 
returned to the students for discussion. 


The members of the control groups were 
urged to study for personal achievement since 
no examinations would be given. They were 
told that their semester grades would be based 
upon their participation in class discussion 
and upon their written assignments. The ex- 
perimental groups were told that their semes- 
ter grades would be based upon their unit 
examination scores, their participation in 
class discussion, and their written reports. 

The tests given the experimental groups at 
the conclusion of each unit were the result of 
the combined efforts of the four instructors 
of the course in child development. Like the 
pre-test they were composed of true-false 
statements, four-answer multiple choice state- 
ments, matching items, and completion state- 
ments. In order to determine whether exam- 
inations, as generally employed in the course, 
sufficiently stimulated the immediate achieve- 
ment of students and the delayed retention 
of the material to warrant their use, it was 
deemed advisable to use similar forms of ob- 
jective examinations. Comparison of the 
unit examinations with the pre-test made it 
possible to avoid the verbal duplication of 
the statments, but they covered the same con- 
tent as the pre-examination. 

One class period was devoted to taking 
each unit examination. These papers were 
corrected and returned to the examination 
groups of students for discussion during the 
following class period. At this time each 
student was shown his comparative standing 
with relation to the rest of the group. This 
procedure gave the experimental groups the 
advantage of practice, review, and, in addi- 
tion, the incentive of rivalry or competition. 
To offset this advantage the control groups 
were given the same two class periods for 
oral review. In conducting the review no 


reference was made to any of the items on 
the test, as such, but the section of the mime- 
ographed units containing the topical outlines 
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was used. These topics were summarized by 
individual students, and additions and cor- 
rections were made by the group. 


At the end of the first semester of the 
school year, fifteen weeks after the pre-test 
had been administered, the experimental 
groups were requested to meet in one room 
for a final examination. At this time the 
end-test was administered. 


The control groups were again told they 
would have no examination but were re- 
quested to meet in another room for a special 
assignment. At the appointed time and place 
all members of the control groups were in- 
formed of the experiment and the apparent 
need for the falsehood. They were also told 
that their semester grades would not be in- 
fluenced by the scores which the individuals 
in the experimental group received on the 
test which they were now to take. 


During the second semester both the ex- 
perimental and control groups were given in- 
formal objective examinations on the comple- 
tion of each unit studied. These were cor- 
rected and then returned to the students for 
one class period of discussion. They were 
then collected by the instructors. The con- 
trol group now had the opportunity to derive 
any benefits which examinations might afford. 
The course offered during the second semes- 
ter, entitled child behavior, was a continua- 
tion of the first semester course, and the 
treatment of the various topics would neces- 
sarily tend to review and fix much that had 
been undertaken during the first semester. 
Twelve school weeks after the administration 
of the end-test; all freshmen students were 
requested to appear at a certain specified 
time and place to take standardized tests in 
the elementary school subjects to determine 
whether or not they could meet the required 
standards in these subjects for assignment to 
student teaching. The students were given 
standardized handwriting and spelling tests, 
but the time they expected to use for qual- 
ifying tests in arithmetic and the social 
studies was devoted to taking the post-test. 
Thus, it is not at all likely that any of the 
students in either the experimental or the 
control groups reviewed or crammed for the 
post-test. 

When the end-test was administered, some 
members of the control group gave evidence 
of expecting an examination. Therefore, a 
check-list of thirteen items was filled out dur- 
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ing the next class period. The results of 
this check-list showed that 45 per cent of the 
students had anticipated that an examination 
might be administered, but that only 32 per 
cent had reviewed or crammed for the exam- 
ination. The average amount of time spent 
in cramming was approximately two and one- 
half hours. per individual. The mean achieve- 
ment of this group on the end-test may have 
been significantly increased over what it 
would have been had they not crammed or 
reviewed. 


Approximately half of the students of the 
control group expressed a dread of examina- 
tions for fear of failure, and approximately 
three-fourths of them believed that class par- 
ticipation and grades obtained on such obser- 
vation assignments and projects as they had 
been given were adequate means for deter- 
mining their grades in the course. While 
only 50 per cent believed they would have 
studied harder if they had been given exam- 
inations, 77 per cent believed that examina- 
tions helped them organize the knowledge 
gained in a course, and 85 per cent said they 
usually studied after an examination to clear 
up doubtful points. 


The individual scores on the psychological 
test were weighted by their Q’s and combined 
to give them equal weight. The means for 
the weighted scores, the high school averages, 
pre-test, end-test, gains on the end-test, post- 
test and gains on the post-test over the end- 
test are shown in Table II. 

The difference between the means of the 
control and experimental groups on the end- 
test, gains on the end-test over the pre-test, 
post-test, and gains on the post-test over the 
end-test were considered the effect of the 
experimental factor. These results are re- 
ported in Table III. 

Comparison of the means shows that the 
experimental group did better on the pre-test 
than did the control group, but the differ- 
ence 6.91 + 2.76 is not statistically signif- 
icant, since the significance ratio is only 2.50. 
The results of the end-test show a mean dif- 
ference of 8.66 + 2.49 points with a sig- 
nificance ratio of 3.49. This shows that the 
examinations given at the end of each of the 
seven units of the course in child develop- 
ment had been sufficiently stimulating to 
make a reliable difference in the experimental 
group over the control group which had had 
no examinations. The differences between 


TABLE II 


WEIGHTED PSYCHOLOGICAL TEST SCORES AND HIGH SCHOOL AVERAGES, 


PrE-Test Scores, END-TEsT Scores, GAIN ON END-TEST, 


-TEST OVER END-TEST 


Post-TEst SCORES, AND GAIN ON POST 


Experimental group 


Control group 


Gain 
on 
end- 


End- 
test 


Pre- 
test 


Weighted 
score 


Post- Gain 
test on 
score post- 


on 


Gain 
end- 


End- 
test 
score 


Pre- 
test 
score 


Weighted 
score 


test 


score 
10 


score 
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test 
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CoMPARISON OF MEAN DIFFERENCES OF PRE-TEST, END-TEST, GAIN ON END-TEsT, Post-TeEstT, 
GAIN ON PosTt-TEST OVER END-TEST* 





Mean 

Control Experi- 

group mental 
group 

1 2 3 
1; Peetei.......<.- 73.91 80. 82 
9. End-test......... 123.76 132. 42 
8. Gainonend-test.. 49.85 51. 60 
4. Post-test.....-.-- 134. 48 138.17 
5. Gain on post-test 

overend-test... 10.72 5. 75 





Correlation 
of matched Standard Significance 
Difference pairs of error of ratio 
of control and mean D 
means experimental differ- — ——-- 
groups ence o diff. 
4 5 6 7 
6. 91 .41 «2.76 2.50 
8. 66 45 +2.48 3.49 
1. 75 . 26 2.88 . 61 
3. 69 . 24 + 2.98 1.2 
—4. 97 . 001 +2.54 1. 96 


*The standard error of the mean difference takes into account the matched pair correlation of the 


control and experimental groups. Thus: 


© aitt. 4/ Fm? + % mg? — 2ry. Tm, % my 


the mean gains on the end-test and the mean 
scores on the post-test are not statistically 
significant. These differences may be due to 
chance, the greater homogeneity of the con- 
trol group, or to the fact that on the pre-test 
the control group had a lower mean score 
than the experimental group and it would be 
easier to make the greater gain on a low than 
a high mean average. 

The difference between the mean gains on 
the post-test over the end-test, 4.97 + 2.54, 
was in favor of the control group. Since this 
is not statistically significant, the difference 
may be due to chance, or it may indicate 
that the examinations which had been admin- 
istered during the interval of time between 
the end-test and post-test had become a 
greater incentive to achievement or a greater 
aid to learning than they continued to be to 
the experimental group. Again, the lower 
mean score on the pre-test and the greater 
homogeneity of the control group may be 
other explanations. 

On the assumption that the control and 
experimental groups were adequately equated 
by their ages, high school averages, and psy- 
chological test scores and that such informal 
objective examinations as were used in the 
pre-test, end-test, and post-test were adequate 
to measure differences in the achievement of 
the two groups, the results of this study show 
that the use of examinations at the end of 
each unit of the course stimulated achieve- 
ment to a significant degree. However, the 
results of this study do not show that the 
greater achievement of the examination group 





persisted to a significant degree after an 
interval of twelve school weeks. 


On the above assumptions, the results of 
this study confirm the results of similar 
studies which have shown that examinations 
used as incentives to achievement produce 
Statistically significant results. They also 
confirm previous studies in that there is as 
yet no evidence to show that the greater 
achievement which has been induced by ex- 


aminations persists after six weeks to three 
months. 


3. SUMMARY AND CONCLUSION 


This study was undertaken for the purpose 
of determining whether informal objective 
examinations as used in a class with fresh- 
men girls at the State Normal School, Gen- 
eseo, New York, sufficiently stimulated the 
immediate achievement and the delayed re- 
tention of learning to justify their continua- 
tion for such purposes. 


Fifty-five pairs of freshmen girls were 
equated on the basis of age, high school aver- 
ages, and scores obtained on a psychological 
test. One control group consisting of twenty- 
seven students and their paired experimental 
group worked with one instructor, while the 
other control and experimental groups con- 
sisting of twenty-eight members each worked 
with the writer. 


At the beginning of the experimental 
period an informal objective examination 
consisting of 253 items was administered to 
all students as the pre-test. 
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The experimental groups were told that 
their semester grades would be based upon 
the examination scores they obtained at the 
end of each unit, their participation in class 
discussions, and the written assignments 
which would be required weekly. The con- 
trol group was urged to study for personal 
achievement, since no examinations would be 
given. They were told that their semester 
grades would be based on their participation 
in class discussion and the written assignment 
which would be required weekly. 


The control and experimental groups were 
given mimeographed outlines of each unit of 
the course in child development. At the end 
of each unit the experimental groups were 
given informal objective examinations cover- 
ing the material of the unit. The control 
groups were given an equal amount of time 
for student conducted oral review. 


To determine the difference in achieve- 
ment between the control and experimental 
groups the pre-test was re-administered at the 
end of the semester. To determine the 
amount of retention of the subject matter, 
this same test was administered twelve school 
weeks later as the post-test. The difference 
in the mean scores of the control and experi- 
mental groups showed that on the end-test 
the experimental group did significantly bet- 
ter than the control group. The difference 
in mean gains on the end-test over the pre- 
test, mean scores on the post-test, and mean 
gains on the post-test over the end-test were 
not statistically significant. This failure to 
find a significant gain may have been due to 
chance, the lower mean score of the control 
group on the pre-test, or the greater homo- 
geneity of the group. The fact that the 
greater mean gain on the post-test over the 
end-test was made by the control group may 
have been due to the stimulating influence of 
the informal objective examinations which 
had been given during the twelve weeks’ in- 
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terval of time between the end-test and the 
post-test. 

On the assumption that the control and 
experimental groups were adequately equated 
and that the informal objective examination 
used as pre-test, end-test, and post-test was 
adequate to measure differences in achieve- 
ment, the results of this study confirm the 
findings of previous studies which have shown 
that the use of examinations stimulates 
achievement to a significant degree. The re. 
sults of this study also confirm the findings 
of previous studies in that there is as yet no 
evidence to show that the greater achieve- 
ment which has been induced by examina- 
tions persists after six weeks to three months. 


4. FURTHER RESEARCH 


In order to determine satisfactorily the 
significance of examinations for achievement 
and retention of learning, further research is 
necessary. It is suggested that an experi- 
ment be conducted covering the entire ele- 
mentary school period, the entire high school 
period, or the entire college period. The con- 
trol and experimental schools participating in 
the experiment should be as nearly alike as 
possible in geographical location, socio- 
economic background of students, instruc- 
tional staff, size of classes, curriculum, and 
methods. The experimental factor would be 
the use or absence of tests and examinations 
for marking purposes. Periodic checks might 
be obtained by the use of achievement tests 
in the subject matter fields, but these results 
in the control system would be used only for 
diagnostic and remedial purposes. At the 
end of the elementary school period, high 
school period, or college period, as the case 
may be, the administration of standardized 
achievement tests would show more ade- 
quately the long time effect of the value of 
examinations as incentives to achievement 
and retention. 





. 
: 


ree rain 


Sev 
numb 
enter 
plann 
ally s 
relate 
ing ‘ 
grouy 
are U 
tellig 
and 
tellig 
atter 
cord 
a te 
five- 

T 
for | 
polit 
a to 
autl 
pect 
pup 
Pint 









Yo. 7 
1 the 


and 
lated 
ation 

was 
leve- 
1 the 
10Wn 
lates 
e re- 
lings 
t no 
ieve- 
1ina- 
nths. 


the 
nent 
h is 
peri- 
ele- 
hoo! 
con- 
ig in 
€ as 
Ici0- 
ruc- 





2 


i 


id 





> 
t 
’ 





AN ANALYSIS OF THE NUMBER KNOWLEDGE OF 
FIRST-GRADE PUPILS ACCORDING TO 
LEVELS OF INTELLIGENCE 


ALBERT GRANT 


Psychological Laboratory 
Public Schools, Cincinnati, Ohio 


Several studies have been made of the 
number knowledge of children when they 
enter Grade I. These studies are of value in 
planning number instruction programs gener- 
ally suitable in this grade. There is need for 
related studies which will aid in differentiat- 
ing such programs for the different ability 
groups within the grade. Since ability groups 
are usually formed according to levels of in- 
telligence, an inventory of the number facts 
and skills possessed by pupils of varying in- 
telligence levels is needed. The present study 
attempts to meet this need by analyzing ac- 
cording to levels of intelligence the results of 
a test of number knowledge given to over 
five-hundred beginning first-grade pupils. 

The measure of number knowledge used 
for this study was Test 5, Numbers, Metro- 
politan Readiness Tests.’ This test contains 
a total of forty items which, according to the 
authors, were selected to explore various as- 
pects of the number abilities possessed by 
pupils in Kindergarten and Grade I. The 
Pintner—Cunningham Primary Mental Test? 
was used as a measure of intelligence. The 
subjects for the study consisted of 563 white, 
beginning first-grade pupils enrolled in nine 
different public elementary schools in Cincin- 
nati, Ohio. The schools were selected so that 
a variety of economic backgrounds would be 
represented. All pupils were six or approxi- 
mately six years of age September, 1935, the 
month during which both tests were given. 


The results of the test of number knowl- 
edge were analyzed separately for the pupils 
in each of three intelligence groupings. These 
groupings were based on the intelligence quo- 
tients of the pupils as determined by the 


’ Hildreth and Nellie L. Griffiths, Metro, 
~—A Tests. Yonkers-on-Hudson, New York: Wor Book 


*Rudolf Pintner and 


C i. - | Vv. Consingiom, Ft Ay 
Primary ental zt. 
York: World Book Co., 1923. ‘ 


Yonkers-on-H 





Pintner—Cunningham test. 
made up as follows: 


(1) 1.Q.’s below 90 (145 pupils) 
(2) L.Q.’s from 90 through 109 
(252 pupils) 
(3) 1.Q.’s of 110 or above (166 pupils) 


In the discussion which follows these groups 
are referred to as “dull”, “average”, and 
“bright”, respectively. Information on vari- 
ous aspects of the number knowledge of the 
pupils in each of these groups is given in the 
tables and discussion which follow. 


The groups are 


RELATION OF ABILITY TO COUNT AND 
LEVEL OF INTELLIGENCE 


The test of number knowledge (Metro- 
politan) used contains eight items which 
measure knowledge of rational counting. The 
pupils’ ability to do such counting is deter- 
mined by items requiring the following types 
of responses: (1) marking a given number of 
objects, (2) selecting the square which con- 
tains a given number of dots. The pupils’ 
responses are secured by instructions such as 
the following: (1) “Mark four of the houses’’, 
(2) “Mark the third pig”, (3) “Mark the box 
with eight dots in it”. The successes of the 
“dull”, “average”, and “bright” first-grade 
pupils in counting, as measured by the test 
items described, is shown in Tables I and II. 
The per cent of pupils who responded cor- 
rectly is shown separately for each group and 
each test item. It is evident from the table 
that the “bright” pupils knew considerably 
more about counting than the “average” 
pupils and the latter group surpassed the 
“dull” pupils. The counting of objects using 
cardinal numbers seems generally to be better 
understood than the counting of objects in 
ordinal relationship. However, the latter 
ability, among beginning first-grade pupils of 
the same age, is also closely related to their 
level of intelligence. 
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TABLE I 


COUNTING:* SUCCESS OF BEGINNING FIRST- 
GRADE PUPILS OF VARYING 
INTELLIGENCE LEVELS 


Per cent of pupils who 
counted correctly 


“Dull” “Average” “Bright” 


Below 90-109 1101.Q. 
Extent of 90 1.Q. 1.Q. and above 
counting N=145 N=252 N=166 
Le 57 64 87 
7 Gees... 39 58 83 
18 objects ....- 19 40 64 
3rd object ~---- 19 . 39 71 
6th object ~---- 21 44 67 


* Based on Test 5, Numbers, Metropolitan 
Readiness Tests. 


TABLE II 


SELECTING SQUARES HAVING A GIVEN NUMBER 
oF Dots:* SUCCESS OF BEGINNING FIRST- 
GRADE PUPILS OF VARYING 
INTELLIGENCE LEVELS 


Per cent of pupils who se- 
lected correct square 


Number “Dull” “Average” “Bright” 
of dots Below 90-109 110 1.Q. 
in square 90 1.Q. 1.Q. and above 
selected N=145 N=252 N=166 
=a 54 82 96 
_ (eee ss 19 43 61 
_ ae ea 54 73 86 


* Based on Test 5, Numbers, Metropolitan 
Readiness Tests. 


RELATION OF ABILITY TO IDENTIFY, WRITE, 
AND INTERPRET NUMBERS, AND LEVEL 
OF INTELLIGENCE 


The test of number knowledge used in- 
cludes a total of twelve items which measure 
ability to: (1) recognize numbers, (2) write 
numbers, and (3) interpret numbers. The 
nature of these items is illustrated by the fol- 
lowing examples: (1) “Look at the row of 
numbers where the hand is. Mark the 4.” 
(2) “Look at the box where the star is. 
Make a number 2 in that box.” (3) “Make 
as many dots as the number tells you to 
make.” The success of the pupils in respond- 
ing to each of the twelve items is shown ac- 
cording to levels of intelligence in Tables 
III, IV, and V. As in the case of counting, 
the ability of first-grade children of the same 
age to recognize, write, and interpret numbers 
is- definitely related to their level of intelli- 
gence. The writing and the interpreting of 
numbers are generally more difficult for all 





[Vol. 7, No. ; 


pupils than 
numbers. 


identifying or recognizing 


RELATION OF ABILITY TO App, SUBTRACT, 
AND MULTIPLY, AND LEVEL OF 
INTELLIGENCE 


The number test used contains several 
items which measure ability to perform sim- 
ple addition, subtraction, and multiplication 
problems. Pupils’ ability along these lines js 
measured by their responses to items such as 
the following: “If I had three blocks and 
daddy gave me four more, mark all the blocks 
I would have then.” The success of the 
pupils is shown in Table VI. There is a def- 
nite relation between ability to add, subtract, 
and multiply, and level of intelligence. In 
general, the “dull” pupils showed practically 
no success in dealing with these processes. 
About one-third of the “average” group and 


TABLE III 


IDENTIFICATION OF WRITTEN NUMBERS:* Suc- 
CESS OF BEGINNING FIRST-GRADE PUPILS 
ACCORDING TO LEVELS OF 
INTELLIGENCE 


Per cent of pupils who iden- 
tified number correctly 


“Dull” “Average” “Bright” 


Below 90-109 1101.Q. 
Number 90 1.Q. 1.Q. and above 
identified N=145 N=252 N=166 
© -ahicowsietetiin 26 60 83 
__ eae eo 34 48 63 
Sees 19 40 67 
a a eee 21 43 62 


* Based on Test 5, Numbers, Metropolitan 
Readiness Tests. 


TABLE IV 


WRITING NUMBERS: * SUCCESS OF BEGINNING 
FIRST-GRADE PUPILS ACCORDING TO 
LEVELS OF INTELLIGENCE 


Per cent of pupils who wrote 
number correctly 


“Dull” “Average” “Bright” 


Below 90-109 110 1.Q. 

Number 90 L.Q. L.Q. and above 
written N=145 N=252 N=166 

| a ante 12 30 58 

ye 6 17 40 

S Wantedetes 9 27 55 

Eee 8 22 61 

DF = nit mcatinienin se 1 9 31 

© mébignmndin 3 21 54 


* Based on Test 5, Numbers, Metropolitan 
Readiness Tests. 
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TABLE V 


INTERPRETING WRITTEN NUMBERS:* SUCCESS 
oF BEGINNING FIRST-GRADE PUPILS 
ACCORDING TO LEVELS OF 
INTELLIGENCE 


Per cent of pupils who inter- 
preted number correctly 


Written “Dull” “Average” “Bright” 
number Below 90-109 1101.Q. 
to be 90 1.Q. 1.Q. and above 
interpreted N=145 N=252 N=166 
4 caudate 16 40 70 
a 9 33 61 


* Based on items in Test 5, Numbers, Metro- 
politan Readiness Test requiring the making 
of dots totalling the written number. 


TABLE VI 


ADDITION, SUBTRACTION, AND MULTIPLICATION 
PROBLEMS:* SUCCESS OF BEGINNING FIRsT- 
GRADE PUPILS ACCORDING TO LEVELS 
OF INTELLIGENCE 


Per cent of pupils who solved 
problem correctly 


“Dull” “Average” “Bright” 


Below 90-109 1101.Q. 
90 1.Q. 1.Q. and above 
Problem N=145 N=252 N=166 
one plus two... 32 56 67 
three plus four. 12 36 61 
six plus six---. 11 30 50 
five minus two_- 17 39 54 
three minus one_ 19 44 69 
ten minus one___ 17 26 35 
three times two- 0.4 18 31 


* Based on Test 5, Numbers, Metropolitan 
Readiness Tests. 


about one-half of the “bright” group mastered 
the addition and subtraction problems. 


RELATION OF KNOWLEDGE OF GEOMETRICAL 
FoRMS AND NUMBER VOCABULARY, AND 
LEVEL OF INTELLIGENCE 


The test used contains three items which 
measure ability to recognize geometrical 
forms and eight items which measure under- 
standing of various number terms. Examples 
of the instructions to pupils concerning these 
items are as follows: (1) “Mark the square,” 
(2) “Mark the middle hat.” The success of 
the pupils on the various items is shown in 
Tables VII and VIII. The tables show that 
the pupils’ understanding of the aspects of 
number represented by the items described 
is definitely related to their level of intelli- 
gence. Furthermore, Table VII shows that 
the pupils were more familiar with the square 
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and the circle than with the triangle. Table 
VIII also shows that a majority of the pupils 
at each level of intelligence can interpret the 
number terms listed with the exception of the 
terms “half’’ and “as long as.” 


SUMMARY 


This investigation has analyzed according 
to levels of intelligence the responses of 563 
beginning first-grade pupils to a test of num- 
ber knowledge (Test 5, Metropolitan Readi- 
ness Tests). The findings may be summarized 
in the following general statements: 


1. Over one-third of the “dull” pupils 
(1.Q. below 90) can count objects which total 
less than ten. About one-fifth understand 
ordinal numbers below the tenth. About one- 
fourth can identify the symbols for numbers 
under ten, but less than ten per cent can 
write or interpret such symbols. About one- 


TABLE VII 


RECOGNIZING GEOMETRICAL FoORMS:* SUCCESS 
OF BEGINNING FIRST-GRADE PUPILS OF 
VARYING INTELLIGENCE LEVELS 


Per cent of pupils who se- 
lected correct form 
“Dull” “Average” “Bright” 


Geometrical Below 90-109 110L1.Q. 
form 90 1.Q. 1.Q. and above 
selected N=145 N=252 N=166 
IEE cerestnenicten ss 54 79 92 
48 76 85 
Zt deecae 19 34 


* Based on Test 5, Numbers, Metropolitan 
Readiness Tests. 
TABLE VIII 


NUMBER VOCABULARY:* SUCCESS OF BEGINNING 
First-GRADE PUPILS OF VARYING 
LEVELS OF INTELLIGENCE 


Per cent of pupils who inter- 


preted term correctly 
“Dull” “Average” “Bright” 
Below 90-109 1101.Q. 
Number 90 L.Q. 1.Q. and above 
term N=145 N=252 N=166 
eet .......- 92 99 100 
middle _------- 84 96 98 
shortest -.----- 74 94 98 
Sa 71 89 98 
CE cncsscce 72 85 93 
smallest ..---- 66 91 93 
Se ‘qneqcessce 55 76 83 
as long as ----- 24 50 68 


* Based on Test 5, Numbers, Metropolitan 
Readiness Tests. 
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half can identify a square and a circle, but 
practically none can identify a triangle. Less 
than one-fifth can solve simple problems in 
addition and subtraction and practically none 
can solve problems in multiplication. 

2. Over one-half of the “average” pupils 
(1.Q. go-1og) can count objects up to ten, 
and about one-third understand ordinal num- 
bers below the tenth. Over one-third can 
identify the symbols for numbers under ten, 
and somewhat less than one-third can write 
or interpret such symbols. About three- 
fourths can identify a square and a circle, 
and about one-fifth can identify a triangle. 
About one-third can solve simple problems in 
addition and subtraction, and a few can solve 
simple problems in multiplication. 

3. Over two-thirds of the “bright” pupils 
(1.Q. 110 and above) can count objects up 
to ten, and a slightly smaller proportion un- 
derstand ordinal numbers below the tenth. 
Over one-half can identify the symbols for 
numbers under ten, and nearly one-half can 
write and interpret such symbols. Nearly 
all can identify a square and a circle, and 
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about one-third can identify a triangle. About 
one-half can solve simple problems in adgj- 
tion and subtraction, and nearly one-third 
can solve simple problems in multiplication, 
4. Over two-thirds of the “dull,” over 
four-fifths of the “average,” and practically 
all of the “bright” understand the following 
number terms: “longest,” “middle,” “short. 
est,” “tallest,” “widest,” and “smallest.” The 
terms “half” and “‘as long as” are understood 
by smaller proportions in each group. 


EDUCATION IMPLICATIONS OF STUDY 


It is generally agreed that instruction in 
arithmetic should be differentiated to meet 
the needs of pupils of varying levels of intelli- 
gence. Such differentiation should include 
starting the teaching at the point where the 
pupils’ previous development has_ brought 
them. The present study should prove use- 
ful in this connection, since it gives, according 
to levels of intelligence, information on the 
previous development in number knowledge 
of first-grade pupils. 
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AN INVESTIGATION INTO THE RELATIVE EFFECTIVENESS 
OF THREE DIFFERENT METHODS OF TEACHING 
GENERAL BIOLOGY IN A NORMAL SCHOOL’ 


Etmo NALL STEVENSON 
Eastern Oregon Normal School 


This investigation was concerned with an- 
swering the following questions: (1) Which 
of three methods—lecture, lecture-discussion, 
and experimental—is most effective in attain- 
ing the objectives of the survey course in gen- 
eral biology? (2) Which method is most 
efficient with respect to: immediate and de- 
layed retention of subject matter, ability to 
apply principles, ability to solve problems, 
and skill in performing laboratory work? 
(3) What is the relationship between achieve- 
ment for the various levels and the level of 
intelligence? (4) Are the experimental meth- 
ods applicable in science courses on the college 
level? 


ORGANIZATION OF THE STUDY 


The investigation consisted of three studies. 
The first was conducive to the setting up of 
hypotheses which were checked by two sub- 
sequent studies. 

The first study was a rotation experiment 
conducted in the Eastern Oregon Normal 
School during the 1934-35 school year. Five 
classes of 117 students, organized into “abil- 
ity-achievement” groups, were taught fifteen 
different units by the same teacher using the 
three different methods. Four classes were 
subjected to the rotation technique, while the 
fifth served as a control. The rotation pro- 
cedure exposed each group to each method 
five times. A two-week rotation plan was 
used for the first two quarters, and a four- 
week one during the third. Superiority of 
method was indicated when the mean of gain 
points of any “ability-achievement” group 
was consistently higher than a superior “abil- 
ity-achievement” group’s performance under 
the same method.” 

The second study was a comparison of 
three equivalent groups taught by one teacher 
using different methods in the Eastern Ore- 

mele ee Oe on file in the 
Determining , the Ett oo Pars of ot ira Methoas 


Teaching,” Tournal 
1929), 255-64. of Educational 


gon Normal School from 1933--36. Each 
group consisted of several classes. A differ- 
ent method was used for each of the three 
years. More than 350 students participated. 

The third study was a comparison of three 
equivalent groups taught by three different 
teachers in three comparable schools during 
1935-36. The methods of instruction used 
were: lecture in the Oregon Normal, lecture- 
discussion in the Southern Oregon Normal, 
and experimental in the Eastern Oregon Nor- 
mal. Over five hundred students in seven- 
teen classes were involved. 


In the second and third studies effective- 
ness of method was determined by consistency 
of mean differences favoring a given method 
and by significant differences between per- 
formance means. Comparisons were made of 
paired, ability, and class groups. 


METHODS OF TEACHING 


Method was used in its broad sense. The 
three organized teaching procedures were 
defined as follows: 


The lecture is a procedure by which the 
teacher gives expression of his knowledge and 
experience on the topic to the class. The lec- 
turer initiates, organizes, presents, interprets, 
and summarizes the pertinent information, 
using whatever techniques of the lecture 
method he can command. 

In the Jlecture-discussion procedure, the 
teacher gives expression of his knowledge and 
experience to the class, solicits questions from 
the students, asks questions, and encourages 
student response. The teacher lectures sixty- 
five percent or more of the class period, but 
never for the full period. 

The experimental method consists of a 
problem solving approach to the unit of in- 
struction, involving a wide selection of activi- 
ties and instructional procedures. The prob- 
lem solving approach includes: definition of 
problem, synthesis of knowledge concerning 
the topic, tentative generalizations, contribu- 
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tions from literature, experience, class, and 
laboratory activities, verification of generali- 
zations, and associated unsolved problems. 
Initiation, organization, presentation, inter- 
pretation, and summaries of the topic are 
principally student made. 


FORMATION OF “ABILITY-ACHIEVEMENT”’ 
AND PAIRED GROUPS 


In the first study, “ability-level” groups 
were formed upon the basis of raw scores on 
the American Council on Education Psycho- 
logical Examination* at the beginning of the 
school year. Raw scores on the achievement 
pretest—Ruch—Cossmann Biology Test, Ex- 
amination: Form A*—and previous quarter 
grade points supplemented and checked the 
psychological test criterion. 


The students in the comparisons in the sec- 
ond study were paired individual with indi- 
vidual on the bases of primary and secondary 
criteria. The primary criteria were: (1) raw 
scores on the American Council on Education 
Psychological Examination, and (2) raw 
scores on the Ruch-Cossmann Biology Test, 
Examination: Form A. Secondary criteria 
included chronological age, sex, intended and 
parental occupations, previous science courses, 
size of high school from which students gradu- 
ated, size of home community, proximity to 
school, and hobbies in natural science. The 
groups were remarkably well equated, since 
the differences between the means on the tests 
were well within the standard érror of the 
difference. 


CourRsEs OF STUDY 


A uniform course of study was pursued, 
with emphasis as follows: first quarter, general 
principles and introduction to lower plants 
and animals; second quarter, elaboration of 
general principles and study of higher or- 
ganisms; third quarter, evolution, heredity, 
and eugenics. The subject matter was or- 
ganized into units in which the time allot- 
ments and emphasis upon salient aspects were 
approximately equal. 


Objectives of the courses, as rated by the 
instructors involved, showed almost a unanim- 
ity of purpose. 

*L. L. Thurstone and Thelma Gwinn Thurstone, Psycho- 


logical Examination. Washington: The American Council on 
ucation, 1934. 


*Giles M. Ruch and Leo H. Cossmann, Ruch—Cossmann 
Biology Test. Yonkers, New York: World Book Company, 
1924. 
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MEASUREMENT OF OUTCOMES 


An attempt was made to measure all pur- 
ported outcomes of a course in general biol- 
ogy. Standardized and teacher-made objec- 
tive tests were used, as well as criteria to 
determine attitudes. 


I. Measurement of achievement 
A. Standardized tests in general biology 


1. The Ruch—Cossmann Biology Test, 
Examination: Form A was given at 
the initial class meeting. The scores 
supplemented the formation of “abil- 
ity-achievement” groups, checked 
achievement comparisons between 
classes of different years under dif- 
ferent methods, and served as a 
primary criterion for pairing stu- 
dents. Examination: Form B was 
given at the end of the second quar- 
ter. It was used as a measure of 
immediate retention of knowledge. 
The difference in scores on Form B 
over those on Form A was used as a 
criterion for measuring achievement. 

2. The Cooperative Biology Test® was 
given at the final class meeting. 
The results were considered as a 
measure of delayed retention of 
information. 


B. Teacher-made objective tests 


1. Unit tests: In the first study, a 
pre-test and final completion test 
were constructed for each unit. The 
differences of the final scores over 
the initial scores constituted gain 
points. The coefficient of reliabil- 
ity was computed by using the 
Spearman—Brown prophecy formula 
which gave correlations for most of 
the tests of over .80. In the second 
study the unit tests served as checks 
on the instruction. 


2. Mid-quarter and final tests: In 
part two of the second study, each 
teacher constructed a composite test 
from contributions by all instruc- 
tors involved. Three such tests were 
used 


3. Application of principles, problem 
solving, and laboratory skills tests: 
These tests included: (1) a multi- 
ple-choice type consisting of situa- 


. L. Fitzpatrick and S. R. Powers, Cooperative Biology 
Test New York: Cooperative Test Service, 1933, 1935. 
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tions to which principles specifically 
applied, (2) unfamiliar problem 
situations designed to solicit the 
most reasonable generalization, and 
(3) items designed to measure accu- 
racy of observation and the ability 
to perform in the laboratory. 


II. Measurement of attitudes and interests 


Students reacted to a questionnaire and to 
interviews in terms of which method was 
liked best and functioned best under various 
circumstances. Interest in the subject was 
determined by the relative number of students 
who availed themselves of the alternative for 
the third quarter of biology, and the relative 
amount of time devoted to the study of the 
subject. Comparative observation notes were 
taken by the investigator throughout the 
experiment. 


FINDINGS 


In the first study there were no differences 
in mean gains that were statistically signifi- 
cant. The differences were attributed to 
chance. The degree of variability in raw 
scores decreased with each successive topic. 
No one “ability-achievement” group under 
any particular method consistently surpassed 
a superior “ability-achievement” group. The 
performance of each was consistent with its 
mental ability. 

Mean gains were rated and ranked. Total 
ratings and rankings based on a point system 
indicated that the experimental method 
slightly surpassed the lecture-discussion, while 
the latter was slightly superior to the lecture 
procedure. 

Comparisons in terms of all mean gains for 
each group under each method were made. 
No actual differences were larger than three 
standard errors of the difference between the 
means. The difference between the means 
of the raw score distribution was much 
smaller. All differences were attributed to 
chance. The rankings based upon the sums 
of the mean gains under each method indi- 
cated the equal effectiveness of the lecture- 
discussion and experimental methods. The 
lecture method ranked lowest. 

The reactions of students indicated slight 
preference for the traditional methods be- 
cause they required less work and thought. 
They considered the experimental method 
best for accumulating facts and as a proce- 
dure for studying other things. 
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In the first part of the second study, where 
the means and sigmas of the classes as a 
whole for three years were compared, there 
was found a striking comparability of per- 
formance in all aspects with no critical ratio 
over 2.3. The degree of variability was 
about equal, and differences are explicable 
upon the basis of chance. The comparison 
of the groups when paired revealed the same 
trends. The rotation of methods group per- 
formed less well than either the experimental 
or lecture-discussion groups of other years. 
The experimental group appears to have per- 
formed slightly better than the lecture-dis- 
cussion group. 

There was a significant difference in per- 
centage of students who continued Biology III 
voluntarily during the year when the experi- 
mental method was followed. This behavior 
is indicative of interest. 

No significant differences appear in the 
comparisons between any of the groups of the 
lowest and highest third in mental ability. 
Slight mean differences favor the experi- 
mental procedure for all levels of mental 
ability. The degree of variability was con- 
sistently less under the lecture-discussion 
method regardless of level. The achievement 
of the highest third was approximately equal 
under all methods. The performance of the 
lowest third was the most variable. 

The instructor’s observations substantiate 
the results of the performance tests. 

In the second part of the second study, 
where the paired populations from the three 
normal schools are compared, the superiority 
of the experimental group over both of the 
control groups in all aspects of achievement 
was demonstrated. Most of the mean differ- 
ences are statistically significant. The lec- 
ture-discussion group rated slightly higher 
than the lecture group. The comparison of 
the performance of the paired ability groups 
revealed the same results. The performance 
in all aspects measured was directly propor- 
tional to the ability of the group considered, 
regardless of method. The lower third per- 
formed slightly better and more consistently 
under the lecture-discussion procedure. 

In the comparison of the 1933-34 lecture- 
discussion group with the 1935-36 lecture- 
discussion and lecture groups, the results sub- 
stantiate those indicated in the previous com- 
parison. The mean differences are not as 
great. This check indicates another factor 
besides method—the teacher factor. 
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CONCLUSIONS 


1. The lecture, lecture-discussion, and ex- 
perimental methods were about equally effec- 
tive in attaining the objectives of the survey 
course in general biology, as measured by 
standardized and teacher-made tests and other 
evaluating devices. 

2. Based upon slight differences in per- 
formance, the methods may be rated as fol- 
lows: first, experimental; second, lecture-dis- 
cussion; and third, lecture. 


a. With respect to both immediate and de- 
layed retention and gain in knowledge 
of subject matter, the methods rank: 
first, experimental; second, lecture-dis- 
cussion; and third, lecture. 

b. With respect to the ability to apply 
principles, the methods are nearly equal. 

c. With respect to the ability to solve prob- 
lems, the experimental method is favored 
by slightly larger mean differences. No 
significant differences existed. 
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d. With respect to the ability to perform 
in the laboratory, the methods are 
approximately equal. 


3. The relationship between performance 
and mental ability is little affected by the 
different teaching procedures. The highes: 
ability groups perform equally well under aj! 
methods. The lowest ability groups seem to 
perform sightly better under the lecture-cis- 
cussion method. Their achievement was the 
most variable. 


4. Student preference, as determined by 
questionnaire and interview, leans toward the 
traditional methods, but the experimental 
method is regarded as most interesting. 


5. The investigation demonstrates that the 
newer, informal problem types of method are 
as applicable in biological science on the 
college level as are the traditional procedures. 


6. The findings of this study corroborate 
those of the other investigators in this field. 
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VARIABILITY AS A MEASURE OF COMPETITIVE BEHAVIOR 


JAMES VAUGHN AND Epwarp GELDREICH 
University of Cincinnati 


PURPOSE 


The studies which form the content of this 
report were initiated for the purpose of ascer- 
taining and explaining the effects of com- 
petitive influences of different sorts on be- 
havior. It became apparent, however, in the 
course of the analysis of the objective results 
that certain methods, commonly used for 
analysis, tended to obscure and disregard im- 
portant information. This report, therefore, 
is concerned not only with the problem of 
describing and explaining the effects of com- 
petitive influences on behavior, but also with 
some methods that are useful in psychological 
analysis but commonly neglected. 


GENERAL PROCEDURE 


Three series of experiments were per- 
formed. Ten persons accommodated the 
writers, in the first series of experiments, by 
working under three conditions of competi- 
tion." The form of behavior measured was 
that involved in shooting at a target with a 
.22 calibre Springfield rifle. The conditions 
of competition were called “high score”, 
“handicap”, and “improvement”. The “high 
score” condition did not involve a handicap— 
the person making the highest score was 
awarded the prize. Handicaps were estab- 
lished by a three-week period of preliminary 
training, involving 300 shots for each person. 
The “improvement” condition regarded im- 
provement as a criterion of achievement. 
The person showing the greatest improvement 
over his preliminary records was given the 
prize. Similar gold medals were awarded for 
the three conditions. 


During the course of the first series of 
experiments certain observations led the ex- 
perimenters to conclude that important im- 
plicit behavior was taking place, and not being 
measured. Consequently, a second series of 
experiments was outlined, in which the skin 
galvanic responses during the three conditions 
of competition were measured. Edward Geld- 
= conducted these experiments. 


records and a detailed discussion of method are 
peeeunted tn Applied’ Prochotonn: XX (February, 1936), I-15. 
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Four persons, who had not served in the 
first series of experiments, acted as subjects. 
They worked individually, flipping checkers 
at a target with the index finger of the right 
hand, while the first phalanges of the second 
and third fingers of the left hand were in- 
serted in liquid electrodes of the Malmud 
type. Gross muscular movements and action 
currents were eliminated by this procedure. 
The conditions of competition were the same 
as those employed in the first series of experi- 
ments. The skin galvanic response for each 
shot was measured.* 


The comments of the persons who served as 
subjects in these experiments, and the objec- 
tively collected data, pointed so strongly 
toward desire and confidence playing leading 
roles in determining individual reactions to 
competitive influences that Professor Vaughn 
decided to conduct a third series of experi- 
ments in which confidence was measured and 
compared with variability. Fourteen per- 
sons participated in this study. They were 
ranked according to variability in reaction 
time,* and this order, in turn, was interpreted 
in terms of the Bernreuter traits of personality. 


RESULTS 


The results of the first series of experi- 
ments, analyzed by the more commonly 
employed group method, are presented in 
Table I. 


On the basis of these records, one is justi- 
fied in concluding that the group, as a whole, 
was more efficient under the “handicap” and 
“improvement” conditions than under the 


Geldreich, ‘“‘The Use of a Calibrated Potentiometer 

the Measurement of the Galvanic Skin Response”, Amer- 
ican Journal of Psychology, XLVII (1935), 491-93. 
*A specially devised was used for meas- 
uring reaction time 
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“high score’ condition. Very often psycho- 
logical analysis has stopped at this point. 
It is, however, very important, from a psycho- 
logical standpoint, to note the fact that varia- 
bility is different under the three conditions. 
The behavior of the subjects was more vari- 
able or erratic under the “high score” condi- 
tion than under the other conditions. The 
reports of the subjects clearly indicate this 
condition of relative variability to be a func- 
tion of confidence in the outcome of the vari- 
ous matches. As a group they were more 
variable under that condition of competition 
which provided least opportunity for gaining 
superiority. 


TABLE I 


DISTRIBUTIONS OF SCORES UNDER DIFFERENT 
CONDITIONS OF COMPETITION 


Range Conditions of Competition 
of High Handi- Improve- 
Scores Score cap ment 
ae 208 190 197 
, RRS se 227 262 245 
7 235 245 257 
aS ae 220 254 259 
ae 207 208 202 
ear ae 188 158 175 
— eae 129 129 117 
SE ae 94 121 101 
er ee ae 77 75 75 
Sen SM 71 44 53 
eee 53 36 49 
| fh Bee 27 24 29 
eerie ees 28 17 13 
I : ackuscievieteaiecaicms 13 13 5 
RENAE OS eee 6 9 6 
I ie cada ie Sat 17 15 17 
(SR Pea 10.84 11.05 11.07 
IG aceite tdi ae 3.32 3.16 3.14 
ge .055 .052 .052 


®po(HS-H)=.16+.04 P.E.»M(HS-H)=—.21+.039 
po(HS- I)=—.14+.04 P.E.»pM(HS- I1)=.23+.038 
Gpo(H - 1)=—.02+.04 P.E.»pM(H - I)=.02+.038 


We were fortunate, in outlining the experi- 
ments, to require of the subjects sufficient 
behavior to make detailed studies of indi- 
viduals. A fairly large number of studies 
fail to do this, although it is generally recog- 
nized that individuals display important vari- 
ations in behavior toward any and every 
condition. 

Statistical summaries of the individual rec- 
ords are presented in Tables II and III.‘ 
These tables contain statistical measures 
which show the reliability of differences 


*Complete individual records may be found in Applied 
Psychology, XX (February, 1936), 1-15. . 
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between means and standard deviations jp 
individual cases. 


For purposes of comparison, Table IV. 
showing important individual peculiarities jy 
reaction to competitive influences, has been 
constructed. 


It is evident from a cursory inspection oj 
these records that individuals behave differ- 
ently under competitive influences. The fore- 
going conclusion, to the effect that everyone 
does more efficient work when the more efi- 
cient are handicapped and when improvement 
is a criterion of efficiency, is misleading be- 
cause some facts of fundamental importance 
are ignored. It is, of course, true that the 
majority of individuals perform more efii- 
ciently under those conditions, “handicap” 
and “improvement”, which, in these experi- 
ments tended to place a premium on low 
initial ability; but there are others, relatively 
few in number, whose level of achievement is 
greater when they are striving to achieve abso- 
lute superiority. In some cases they are the 
most efficient to begin with, but in other cases 
they have only the desire to win without the 
more efficient being handicapped. There are 
also differences in level of achievement between 
the “handicap” and “improvement” conditions. 


The study of individual variability under 
the different competitive influences is inter- 
esting. It is no accident that five persons 
displayed greatest variability under the “high 
score” condition, while for some the variabil- 
ity was greatest under the “improvement” 
and “handicap” conditions. 

The verbal reports and discussions of the 
competitors clearly associate variability with 
desire and confidence. It is an inevitable 
conclusion that variability is greatest under 
those conditions where one desires to win but 
lacks confidence in his abilities. Where 
desire to win is associated with confidence, 
variability is generally least, and the level of 
achievement is generally highest. 

It is obvious that individual analysis is 
productive of psychological information o0/{ 
fundamental importance. This is not a new 
idea. It is also quite clear that individual 
variability should be taken into consideration, 
and that psychological studies should be 
arranged, whenever possible, for such a pur- 
pose. Psychological analysis, in terms of the 
individual, might clarify a great many dis- 
crepancies which have appeared in different 
studies of efficiency, learning, and motivation. 
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TABLE IV 


INDIVIDUAL LEVEL OF ACHIEVEMENT AND VARIABILITY UNDER CONDITIONS OF COMPET!T! 


Conditions of Competition 


Sub- High Score 
jects Mean S. D. 
Hu ; See ; es 100* 
Bi ; ie 100 
Od — — 50 
Si ity + - we 
Wi + 

St : 

Cu a + 


Mean 


Handicap Improvement 
S. D. Mean S. I 
= 100 
—_— : “50 
3 98 
- &5 
- 90 


100 


+ ES ee 


*Chances out of 100, level of achievement under the condition was highest. 


+ Condition of greatest variability. 
—Condition of least variability. 


The results of the study concerned with the 
skin galvanic responses are displayed in 
Table V. For present purposes we are pre- 
senting only the comparisons between stand- 
ard deviations. The differences between the 
averages were compared and found generally 
unreliable. The subjects’ initials are presented 
in the first column; the second, third, and 
fourth columns contain the standard deviations 
under the different conditions of competition. 
Asterisks indicate reliable differences. 


The general consideration of variability is 
interesting, primarily, because it sheds light 
on the meaning of variation in the magnitude 
of the skin galvanic response. It will be ob- 
served that variability is greatest under the 
condition of open competition, and least under 
the condition where improvement is the cri- 
terion of efficiency. This general order, how- 
ever, is not characteristic of every subject. 
“ER”, for example, displays greatest variabil- 
ity under “improvement.” 


TABLE V 
VARIABILITY IN THE PSYCHOGALVANIC 
RESPONSE UNDER THE DIFFERENT 
CONDITIONS OF COMPETITION 


Condition of Competition 


High Handi- Improve- 
Subject Score cap ment 
a ee 14.750* 11.400* 8.010 
UE wisutdineneinoes 13.080 12.790* 14.115 
OAT: 10.088* 9.015* 7.197 
SS 9.197* 8.614* 4.840 
RE cial opietachin 11.766 10.455 8.540 


* Reliable differences: millimeters of deflec- 
tion indicated by scores. 


It has already been shown that variability 
in overt behavior is associated with confidence. 
The skin galvanic response now appears to 
be associated with confidence. By assuming 
that lack of confidence or doubt exerts an 
inhibitory effect on overt expression, the skin 
galvanic response appears to be an indication 
of an alternate form of expression. The 
energy of conation, encountering inhibition, 
diffuses internally and the sweat glands re- 
ceive a share. The fact that the level of 
response does not change, as shown by the 
means, may be taken to indicate oscillation 
between confidence and doubt. The person 
does not undergo one continuous experience 
of doubt—there is for all conditions a mixture 
of the doubtful and confident attitudes. Some 
conditions contain more of one attitude than 
the other. This interpretation reminds one 
of the studies of Aveling® and Cattell.® 


The results of the final series of experi- 
ments are presented in Table VI. 

The table contains measures of variability 
and Bernreuter indices of personality traits 
for seven subjects. We have taken extremes 
in the group in order to simplify comparisons. 

If, in a study of this sort, one could dea! 
with the group and ignore individuals, the 
task of drawing a conclusion would be simple. 


SF. Azeling, “The Conative Indications of the Psycho- 
galvanic Phenomenon,” Proceedings, ae International Con- 
eres of hology, 1926, pp. 2 7-234 
_ eR. B. “The Significance of the Actual Resistance 
in the Experiments,” British Journal of Psy- 


hology Goncal Section), XIX ‘Ci928), a. 
z R. B. rte K. 2 Correlates of 


nts on the sychical 
Paptpeewentc hk Journal of Psycholozy 
Cee Cattell xe io 29) 4 «a £ Cognition 4 
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tational - A Journal 
of Psychology Development Ses hi No. 14, 1930, Vol. 
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TABLE VI 


VARIABILITY IN REACTION TIME IN RELATION 
to BERNREUTER PERSONALITY TRAITS* 
¢ in 
Sub- reaction Personality traits 
ject time BIN B2S B3I B4D FIC F2S 
B ... 22 3 49 4 79 8 27 


V --- 23.2 8 88 11 96 18 68 
C uw ae 63 68 61 52 62 75 
8 ... me 94 89 89 18 82 93 
Ave. -. 24.8 42 72 41 61 42 64 
V .-- 87.7 91 68 93 46 87 97 
W --- 40.2 65 38 46 14 66 58 
D --- 43.5 93 156 86 12 89 67 


Ave. -- 40.5 83 40 75 24 81 74 


*Bernreuter Indices are given in per- 
centages. 


One could say that the least variable group is 
most stable emotionally, self-sufficient, extra- 
verted, moderately dominant and confident, 
and fairly sociable. The most variable group 
is emotionally unstable, lacking in self-suffi- 
ciency, inclined toward introversion, rather 
submissive, lacking in confidence and slightly 
more affected by the social group than the 
least variable group. This is a good picture 
and not out of line with sound theory. There 
appear, however, to be individual differences. 
It is apparent that constancy may be achieved 
in different ways, but, in the exceptional indi- 
vidual patterns that appear the relations 
between desire and confidence appear to be 
important factors in the adjustments. The 
Bernreuter test characterizes the dominant 
and variable individual as one _ lacking 
confidence. 
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CONCLUSIONS 


The results of these experiments justify 
several conclusions: 


1. Level of achievement is a function of 
the individual’s appraisal of the situation be- 
fore him. The higher levels of achievement 
are made in those situations which provide 
better opportunities for success along indi- 
vidually selected lines. 


2. Variability in behavior, implicit as well 
as explicit, is a function of the relation between 
desire and confidence; erratic behavior issues 
from lack of confidence. 


3. Confidence exerts, simultaneously, an in- 
hibitory and a facilitating effect on behavior. 
The inhibition of overt expression, through 
lack of confidence, results in the deflection of 
the energy of conation to internal bodily or- 
gans, and these become more active than if 
overt expression were not inhibited. The 
skin galvanic response reflects the expectant 
attitude of the subject. 


4. Oscillation between confidence and doubt 
occurs for each individual under different con- 
ditions of competition. The oscillation is 
generally greatest in open competition, but in 
individual cases it may be associated with 
other conditions. 


5. Detailed analytical studies of individuals 
are productive of a type of information that 
is often obscured in group analysis. In such 
studies, measures of variability may provide 
information that cannot be secured in other 
ways. 
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THE EXPERIMENTAL PSYCHOLOGY OF COMPETITION 


JAMES VAUGHN AND CHARLES M. DISERENS 
University of Cincinnati 


EVOLUTION OF THE CONCEPT 


The principle of competition was first clearly 
recognized in the field of economics where 
Adam Smith (105) and the French Physio- 
_crats used it to explain the spontaneous and 
for the most part unconscious efforts, relation- 
ships, and social structures which the members 
of a society have developed in the pursuit of 
the means of subsistence. For a modern 
account of the nature of economic motives the 
reader may consult Dickinson (33). Such 
writers as Hobbes (47) and Machiavelli have, 
of course, anticipated certain aspects of com- 
petition in their general theory of society as a 
“war of each against all”; and, even an early 
Greek philosopher, Heraclitus (500 B. C.), 
may have had a vague idea of the psycho- 
logical effects of competition, if we may judge 
from his obscure saying that “war is the 
father of all.” Ancient Greek life was organ- 
ized on the basis of competitive motives, which 
were encouraged in every way, especially in 
athletic and artistic contests and in daily 
politics (97). The eighteenth century econ- 
omists, however, were the first to attempt to 
explain in detail a whole phase of human 
social activity, and one of them, T. Malthus 
(70), applied the principle to explain the 
cause of survival in human population. 


The biologists, led by Darwin (28), ex- 
tended the principle of competition to the 
whole range of organisms, both plants and 
animals, under the name of the “struggle for 
existence” (29). This is essentially competi- 
tion between species or members of the same 
species for the means of subsistence and re- 
sults in the elimination of the unfit. It cor- 
responds exactly to the economic competition 
in human society, except that it is simpler. 
Competition is, then, if not the mainspring, at 
least the chief factor of active adaptation in 
living beings. Most biologists followed Dar- 
win, and an extensive literature on the forms 
and mechanism of competition in the plant 
and animal world has appeared and is sum- 
marized in the various textbooks of oecology. 
A partial summary is given by Park and Bur- 
gess (89). Further details may be gathered 


from the works of Allee (3), Alverdes (6), 
and Williams (122). 

The sociologists (89) utilized the principle 
and recognized competition as one of the four 
great forms of social interaction or social 
process, viz., competition, conflict, accommo- 
dation, and assimilation. 

Competition may be regarded as a uni- 
versal principle, one fundamental to all nat- 
ural sciences, that is, to science which seeks 
to describe change in terms of a process occur- 
ring between distinct units or elements, e.g., 
physics, chemistry, biology, psychology, and 
sociology. The logical principle is the same 
in all these fields of phenomena, although the 
elements and forces involved are different. 


EVOLUTION OF THE PROCESS OF COMPETITION 


We might assume that competition in its 
simplest form is a mere interaction and modi- 
fication of movements, due to the presence of 
a number of organisms in a circumscribed 
area, each following the laws of its nature or 
the direction of the forces inherent in the 
system. In such cases there is no internally 
determined reference to the movements of 
other organisms. In short, there is no guid- 
ance from within of a mental character, 
though the prototype of conscious guidance 
may be present. The fact of competition is 
scarcely more psychological than the move- 
ment of the balls on a pool table when the 
initial player breaks the set. To a spectator 
the balls may seem to compete more or less 
in their progress toward the other end of the 
table. There is interference and modifica- 
tion of movement, but no control or aware- 
ness of the process on the part of the ball. 
It is a phenomenon of the resolution of physi- 
cal forces. Analogous phenomena occur among 
units at the chemical and physiological levels. 
In mixtures of chemical solutions the constitu- 
ents seem to compete in ionization. In the 
animal body cells, tissues, and organs often 
seem to compete for mastery, the result being 
a relative dominance in one cell, tissue, or 
organ, and an inferior development in others. 
In any event the result is equilibrium in the 
inorganic world, in physiology, and in animal 
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and human society. Competition is always 
a way of attaining some kind of equilibrium 
(89). It is a resolution of forces (89). 


Perhaps for the pure behaviorist, competi- 
tion is nothing more than this, however com- 
plex the appearances may be. As living beings 
evolve, however, the appearances grow very 
complex and what seems to be a kind of in- 
ternal guidance of the organism, with refer- 
ence to the movements of other organisms 
toward the same goal, arises. The physical 
or physiological process takes on a character 
of self guidance, and in human beings at least, 
of awareness, at first partial and finally com- 
pletely conscious of both the process and its 
intention. In short, competition becomes con- 
scious rivalry or emulation and thus becomes 
a drive or motive in the commonly accepted 
sense of these terms. Just when competition 
emerges aS a drive, occasionally completely 
conscious, is a matter of speculation; but we 
seem to find it definitely among the higher 
social vertebrates, where -the powerful males 
compete for the leadership of the group, e.g., 
the buffalo herd or the wolf pack. A genetic 
study of competition among animals in gen- 
eral seems to prove that competition for food 
and habitat between members of the same 
species and even between different species 
or even between certain animal and vegetable 
organisms is universal (29). 


Somewhat later, among the higher animals, 
competition for mates assumes the form of 
passive competition for attention among the 
females and active competition for the fe- 
male’s acceptance among the males, often 
leading to combat between the latter. Dar- 
win (29) has given an invaluable summary 
and commentary on this type of competition 
in his chapters on sexual selection. At a still 
later stage of evolution there appears com- 
petition for group dominance, illustrated by 
the rivalry between stags, wolves, dogs, buf- 
falo, and many other animals which ends in 
the assumption of a kind of leadership on the 
part of the victorious male. 


COMPETITION IN MAN 


In man primitive forms of competition con- 
tinue with undiminished vigor throughout a 
long prehistoric period, and new forms are 
added. The objects or stimuli to competi- 
tion gradually become more varied, i.e., ob- 
jects or purposes merely associated with the 
primary object of competition become stimuli 
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_ 
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to competition in their own rights. Competi- 
tive activity is conditioned in all sorts of 
ways, and this process has been accelerated 
since the beginnings of recorded history. In- 
deed this conditioning was an essential to 
competition becoming a genuine motive of 


‘behavior rather than a mere mechanism or 


means to the satisfaction of a few general 
motives, such as self preservation, sex, and 
mastery. By the end of the paleolithic period 
competitive activity had been conditioned in 
so many ways that it was practically inde- 
pendent of any particular stimulus object and 
it was often practiced as an end in itself. It 
had thus become a true motive, a means of 
motivating activities not necessarily interest- 
ing in themselves. During this time those 
individuals who failed to develop a competi- 
tive consciousness were rigorously eliminated 
by natural selection, thus accounting for the 
almost universal tendency of people to indulge 
in some kind of competitive activities and to 
find satisfaction in the process. 


COMPETITION DURING THE HisToRIC PERIOD 


The role of competition in history is shown 
by the practice of contemporary savages and 
of primitive and ancient societies in general. 
The American Indians led a life of competi- 
tion within and without the tribe. For them 
war was primarily competition between tribes 
or different members of the same tribe for 
glory, as among the Iroquois, or for hunting 
lands, as among the tribes of the West. 
Within the tribe the men spent much of their 
leisure in competitive sports, e.g., football, 
wrestling, or target practice; also in gambling 
or in the competitive display of power or 
wealth, as among the Northwest Coast 
Indians (41). 

According to Homer, Greek heroes spent 
much time in competitive sports and during 
the historic period, as Ordahl (88) and David- 
son (32) have pointed out, every city and 
most of the small towns of Greece held annual 
contests. These were usually in athletics but 
often in music and literature, and there were 
prizes of nominal value for competitors of 
different ages and grades of skill. Greeks 


from everywhere competed at the games for 
glory and a merely symbolic prize, such as a 
laurel crown. ; 

A historical survey of methods of education 
seems to show the gradual substitution of 
competition for more primitive means of moti- 
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vating the learner. The earliest education 
among moderns as well as primitives was play 
and much of the play, as well as organized 
games, was of a competitive character. Really 
primitive peoples like the Australians have no 
formal system of education but, following the 
period of spontaneous competitive play dur- 
ing early childhood, utilize competition, 
rivalry, or emulation as well as punishment 
in the initiation ceremonies of youth which 
serve as asubstitute for formal education (45). 


In the earliest civilizations such as those of 
China, Chaldaea, and Egypt, where formal 
- education first appeared, the means of per- 
suasion were the rod, whip, or other forms of 
corporal punishment, but the spirit of com- 
petition was usually fostered as a substitute 
for pain or brute force. This may be illus- 
trated with equal care from the history of 
pedagogy in Greece and Rome or in England 
and America (32). 

The general sequence in the development 
of educational practice is from a stage where 
punishment is the chief motive to a stage 
where many motives are called upon, but 
competition is always one of the motives in- 
volved because it is inherent in the social 
system. Educators have always utilized com- 
petition as a motive for superior attainment, 
but perhaps inadequately. Hale (44), speak- 
ing of the universities, declares that “if the 
faculties and boards will do for scholarship 
competition what the rule makers, schedule 
makers, referees, and score keepers do for 
athletics we can create an interest in scholar- 
ship that does not now exist.” He notes that 
socially recognized forms of competition 
usually employed in combination with prizes 
include such things as: 


ins 


1. Horse racing to improve the breed of 
horses. 

2. Automobile racing and touring for 
prizes to improve the designs of cars. 

3. The game of Kriegsspiel to interest 
military students. 

4. Prizes for rifle shooting to incite sol- 
diers to practice shooting. 

5. Competition between members of a 
sales force to see which can get the 
most orders in a given time. 

6. Competition between gangs of men on 
construction work. 

7. Prizes for poems, orations and essays 
at colleges and elsewhere. 
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8. Prizes like the Edison medal, the Kel. 
vin medal, and the Nobel prizes. 
g. Competition at drop kicking between 
members of a football squad to develop 
a drop kicker. 

10. Prizes for lawn tennis given by hotel 
keepers to fill their hotel.” 


To this list might be added the many forms 
of contest involving the solution of puzzles, 
the invention of slogans, and the attractive 
descriptions of sales articles, staged by many 
manufacturers to promote increased sales of 
their goods. Indeed, the list of socially rec- 
ognized forms of competition might be ex- 
tended indefinitely, and there are also per- 
sistent forms of competition without general 
social recognition, e.g., in gambling. It is 
clear that social scientists seeking a basic de- 
terminant of social phenomena might find in 
competition a more important factor than 
complacency, consciousness of kind, codpera- 
tion, and other characteristics of men which 
have been put forward as fundamental. 

The older Darwinian sociologists tried to 
explain many social phenomena by competi- 
tion, but conceived it as a mere mechanism of 
survival, rather than a living drive. Recent 
writers make better use of the term. Cooley 
(26) stresses the role of emulation and rivalry 
in social organization, and Ross (94) char- 
acterizes the spheres of effectiveness of the 
competitive motive in society. Allport (5), 
in an objective treatment of social psychology, 
makes extensive use of competition as an ex- 
planatory factor in social behavior, and gives 
a partial review of the experimental literature 
of competition. A more complete review is 
presented by P. T. Young (125). 


THE EXPERIMENTAL LITERATURE 
Types of Investigations 
Studies of competition which may properly 
be included in a survey of the experimental 


literature may be grouped in_ several 
categories: 


1. Theoretical studies pertaining to the 
types of competitive situation, condi- 
tions under which the competitive spirit 
may be aroused, and special effects oi 
competition. 

2. Naturalistic and statistical studies of the 
extent to which competition enters into 
certain spontaneous activities of men, 
such as plays and games. 
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>. Statistical studies by educators on the 

~ effect of competition in stimulating 
scholarship as based on class ranking, 
general and special achievements, honors, 
prizes, awards, etc. 

4. Studies by industrial psychologists in 
which competition enters as one of the 
conditions of a trade or activity, e.g., 
bricklaying, loading, etc., but where the 
primary object of study is some other 
factor such as bonus, length of rest 
pauses, motion pattern, degree of fatigue, 
etc. 


5. Precise experimental studies of various 
activities where competition is an inci- 
dental factor or is introduced as a 
motive of assumed efficiency but in 
which competition is not the primary in- 
terest of study. Such studies, however, 
contain incidental data of considerable 
suggestiveness. 

6. Finally, experimental studies of a pre- 
cise character where the efficacy of com- 
petition as a motivating factor under 
varying conditions is the primary interest 
of the investigation. 


THEORETICAL STUDIES 


Several studies of a theoretical character 
are sufficiently related to the experimental 
analysis of the psychological effects of com- 
petition that they may be included as a part 
of the discussion of the experimental litera- 
ture on competition. They lay the founda- 
tion for the experimental attack of the prob- 
lem, and in many cases involve actual experi- 
ments. They also indicate the logical limits 
of analysis. Such studies include those by 
Bills (12), Burnett and Pear (16), Riddle 
(92), Vernon (117), Spear (107), Harrison 
(46), Griffith (43), Vaughn (115), Whitte- 
more (120), Leuba (63, 64), Rogers (93), 
Baumgarten (11), Forlano (38), and Mead 
(78). 

Two very important problems, from an ex- 
perimental point of view, have to do with the 
types of competitive situation, and the con- 
ditions under which the competitive attitude 
may be aroused. Bills (12) has suggested 
three general directions which competition 
may take, namely, (1) competition in which 
the competing individual is a member of a 
group competing against another group, (2) 
competition of one individual with another, 
and (3) competition against one’s own record. 
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These categories do not exhaust the special 
arrangements which are peculiar to particular 
experiments, such as those by Triplett (111), 
Vaughn (115), and others; but, they are suf- 
ficiently general to allow the inclusion of 
special modifications. A number of experi- 
ments by Hurlock (52), Gates (40), Travis 
(109, 110), and Laird (60) have studied the 
effects of the group on individual perform- 
ance so that it seems necessary to designate 
this type of influence as a special type of 
competitive situation. 

The conditions under which the competi- 
tive spirit may be aroused are numerous. 
Practically any situation where energy is ex- 
pended in work or learning is an occasion for 
arousing the desire to achieve superiority, 
and this always predicates the competitive 
attitude. It seems unnecessary for one to 
have knowledge of previous records, or for 
one’s competitors to be present in person, 
since one can compete against imaginary rec- 
ords or rivals. To be sure the effects of com- 
petition may vary according to the conditions 
surrounding activity (115), but the psycho- 
logical disposition may be present in degree. 
Furthermore, rivalry may be either inten- 
tional and made explicit by directions or im- 
plicit in the situation. The fact that a given 
experimental set-up does not utilize specific 
instructions for arousing the desire to excel 
does not indicate that the competitive atti- 
tude may not be aroused. It is, of course, 
true that specific instructions designed to cre- 
ate the competitive urge in an experiment 
that undertakes the analysis of the effects of 
competition need not arouse the desire to 
compete. It also very often occurs that in- 
structions of one kind, formulated very pre- 
cisely, for the purpose of creating a particular 
kind of competitive attitude may develop an 
entirely different attitude (115). 

The problem of eliminating competition or 
reducing it to a minimum in order to study 
the relative effects of other motives has been 
discussed by Allport (4), Williamson (123), 
and Young (125). According to. Allport, so- 
cial facilitation is the influence of a social 
group upon its individual members. In order 
to investigate the exact influence of this fac- 
tor upon members of a group, in contrast to 
solitary workers, Allport conducted experi- 
ments in which by keeping constant the 
light, air, seating, and furnishings, he assumed 
that rivalry and the facilitating presence of 
others were the only variables. These vari- 


a oe 








ee 
L an 


80 JOURNAL OF EXPERIMENTAL EDUCATION 


able factors produced a distinct increase in 
quality and quantity of group work. How- 
ever, inasmuch as the subjects were instructed 
to work at maximal speed, upon the same 
tasks, within a set time, with no comparison 
of achievement, he felt that rivalry had been 
eliminated, leaving social facilitation as a 
single variable. Young (125) apparently 
agreed with this conclusion. 

Williamson (123), however, has given sev- 
eral trenchant criticisms of this procedure. 
He writes to the effect that it is not evident 
that the expedients named would eliminate 
the factor of rivalry. The subject may have 
an incentive within himself to compete against 
time or a standard of accuracy. The only 
instruction was the verbal command of the 
instructor. The possibility of self-instruction 
was overlooked. The only facts we know 
about the subjects are their number and class 
and grouping. There is no indication of elim- 
inating those accustomed to group perform- 
ance. The tabular reports of results do not 
support Allport’s claim. Allport denies so- 
ciality to a solitary person, whereas he may 
have ideal personalities or personalized things, 
so that one can never be sure of eliminating 
social setting and influence. 

Practically any task, involving either cog- 
nitive or motor abilities or both, may be 
studied with reference to the effects of the 
competitive attitude on the performance of 
the task. The majority of studies, however, 
have utilized the simpler abilities, apparently 
for the reason that simple abilities lend them- 
selves to objective measurement more readily 
than complex abilities. The efficiency of 
work, with the learning factor controlled, has 
been the object of study far more frequently 
than the efficiency of learning. This prob- 
ably reflects the interests of practically minded 
men. Very little attention of an experimental 
sort has been given to the effects of competi- 
tive situations on the learning activities of 
children in school, and of adults. In educa- 
tional circles various sorts of projects in con- 
nection with the school subjects have been 
developed for the purpose of rating achieve- 
ment and increasing achievement by the in- 
troduction of competitive situations, but very 
little has been done by way of a systematic 
analysis of the effects of competitive attitudes 
or situations on the development of the com- 
plex mental processes that may be involved 
in such subjects as arithmetic, civics, litera- 
ture, history, etc. Spear (107), for example, 
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studied the effects of individual competition 
on the efficiency of work in chemistry, phys- 
ics, and general science. Ninety special proj- 
ects or methods for developing competitive 
attitudes in these subjects were outlined. The 
projects included such tasks in general science 
as, (1) reading and reviewing a scientific book. 
(2) delivering a lecture based on original in- 
vestigation, (3) proposing an original idea, 
(4) making a practical application of an idea, 
(5) learning the wireless code, (6) learning 
the names of twenty great scientists, and (7) 
repairing something broken. Points were 
given for the completion of each project, and 
medals were given for the three highest scores. 
A number of other students have conducted 
somewhat similar studies, the most important 
difference being in the type of competitive 
situation. 


Attention has been called sufficiently often 
to the possibility of specific individual char- 
acteristics, such as race, sex, and age, con- 
tributing to determine the kind of reaction to 
competitive situations. Some studies have 
been made of these problems. Leuba (63), 
for example, noted certain, though not pro- 
nounced, sex differences in competitiveness 
and other motives. Rogers (93), basing his 
opinions on a qualitative study of the results 
of female competition at the Olympic games, 
urges that “women have neither the physica! 
makeup nor the psychic disposition for com- 
petition.” One is inclined to wonder just 
what the physical makeup and psychic dis- 
position are. Baumgarten (11) discovered 
that girls are strikingly less competitive than 
boys, but observes that such a statement as 
this is contingent upon a particular situation. 
Forlano (38) concluded that sex competition 
is the strongest of all motives to work in pre- 
adolescents. Mead (79), however, has raised 
the question in her study of sex and tempera- 
ment in primitive societies concerning the pos- 
sibility of the social pattern being the deter- 
mining factor. The psychoanalysts (48) have 
suggested that some individuals reveal “vio- 
lent recoil from competition.” There is also 
the possibility that a given person will change 
in the course of experience and display an 
entirely different reaction to competitive in- 
fluences. These are a few of the important 
special problems in the psychology of com- 
petition, and they should be fully investigated 
by the methods of individual psychology. 
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COMPETITION IN SPONTANEOUS ACTIVITIES 


Studies of competition in spontaneous 
activities may be illustrated by McGhee’s 
(77) statistical investigation of the competi- 
tive motive in play. McGhee studied the 
activities of 8718 school children, including 
3958 boys and 4760 girls of all ages. An 
analysis of their games according to the domi- 
nant interest showed that at the age of six, 
forty-five per cent of the boys’ choices and 
twenty per cent of the girls’ choices were for 
competitive games, while at the age of eigh- 
teen these percentages stood sixty-three and 
sixty-eight, respectively. There was a steady 
rise in the curve of interest for competitive 
games. Some years before, Johnson (56), 
dealing with children in New York, arrived 
at similar conclusions. Ravenhill (91), in- 
vestigating the play preferences of 6369 Eng- 
lish school children between the ages of three 
and thirteen years, found a decided prefer- 
ence for “active social games,” varying from 
sixty-six per cent at three to forty-eight per 
cent at thirteen. It was partly on the basis 
of such data as these that Kirkpatrick (58) 
was able to characterize the period from six 
to twelve years as the “age of competitive 
socialization”. 


The growing interest of children in com- 
petitive games is probably very largely a 
product of the social and economic systems 
which surround the children. The desire to 
achieve superiority may be present as a part 
of the biological equipment of every human 
being, but the environment in which one is 
reared probably determines the specific char- 
acteristics of the competitive urge. If we 
may generalize observations of the develop- 
ment of various personality traits, the studies 
by Sherman and Henry (103) show this very 
clearly. Recent experiments in Russian schools 
(90, 96) also indicate that the attitude of a 
child is strongly influenced by the adult life 
around it. Children from different social 
strata show different habits in group conduct. 
Adler (1, 2), of course, has pointed this out 
time and time again. In connection with this 
problem, a rather interesting field of study has 
to do with the effects of failure in competi- 
tive activities. This problem is dealt with in 
such studies as those by Sherman (102). 


EDUCATIONAL STUDIES 


The interest of educators in the psychol- 
ogy and pedagogy of competition is perfectly 
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natural. In one form or another competition 
has been used as a motivating influence dur- 
ing the entire history of pedagogy. It has 
only been within recent times, however, that 
any really serious attempt has been made to 
analyze and understand the effects of differ- 
ent competitive situations on behavior. To 
be sure Burnham (17) and his student, Trip- 
lett (111), called attention in a mild way to 
the serious consequences of the purely me- 
chanical use of the competitive urge, but a 
cursory inspection of various textbooks on 
educational psychology and methods of teach- 
ing clearly indicates that these warnings have 
not been clearly understood. This is natural 
enough, however, in view of the tendency on 
the part of most of us to follow precedent 
until the need for creative achievement has 
been unmistakably emphasized. In the case 
of competition this has been done not only 
through laboratory analyses but in a much 
broader and perhaps more significant way in 
the clinic. Today, one hardly finds a book 
dealing with mental hygiene or abnormal psy- 
chology that does not discuss, either directly 
or indirectly, the important psychological 
effects of competition. It is, of course, well 
known that one entire system of psychology 
is devoted almost exclusively to the problem 
(1, 2). 

In a broad sense any investigation of the 
psychological effects of competition is of edu- 
cational importance, and it might be presumed 
that any study of educational importance is 
of general practical and industrial interest. 
The studies, however, which are more or less 
arbitrarily included as definite contributions 
in this section of the report are those by 
Schmidt (98), Mayer (74), Meumann (80, 
81), Moede (82), Muller (85), Hurlock (40, 
50, 51, 53), Bykowsky (21), Maller (68), 
Charles (25), Leuba (65), and Bos (13). 

The earliest studies of competition are those 
by Schmidt, Mayer, and Meumann, all in 
1904. Schmidt compared achievement for 
home work in isolation with social class work 
in writing, composition, and arithmetic. The 
classroom work proved definitely superior in 
quality, the home work being marked by a 
greater number of omissions and errors. So- 
cial stimulation and competition were regarded 
as responsible for the superiority. Mayer 
compared work in isolation with that achieved 
in group situations where spontaneous rivalry 
developed. Twenty-eight boys, averaging 
twelve years of age, were tested in five tasks 
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as class exercises, and in addition each boy 
was required to prepare a similar task for 
comparison in which he sat alone in the class 
room with only the teacher or a colleague 
being present. In general the work of the 
pupils in groups was superior to individual 
work. ‘There were, of course, marked indi- 
vidual differences, but the openly competitive 
situations improved speed from 30 to 50 per 
cent, reduced errors, and increased the uni- 
formity of work. Meumann repeated dyna- 
mometric and ergographic tests on seven 
pupils from thirteen to fourteen years oi age. 


The tests were given to the pupils individuaily 


and alone, with only the teacher present, with 
only the group present, and with both the 
teacher and the group present. Efficiency 
was found to increase with the magnitude of 
the group, i.e., efficiency was highest with 
both the teacher and the group present. In 
other experiments, reported in 1914, Meu- 
mann (80, 81) studied the effect of the social 
situation on memorizing in children irom 
eight to fourteen years of age. Lists of words 
from four to twelve syllables were read aloud, 
the auditors at once recording all words re- 
membered. The younger children remem- 
bered more when tested in the group, the 
older children were scarcely affected. The 
effect of the social situation varied inversely 
with age. Perhaps the element of competi- 
tion, inherent in the situation, had ceased to 
be effective in the older children because the 
situation had become too habitual and un- 
interesting in the course of many school years. 

Several series of experiments are reported 
by Moede (82, 83, 84). One series utilized 
the task of making dots with a pencil at cer- 
tain designated spots on a large sheet of 
paper. The children worked individually, in 
pairs, and in groups of sixteen. They worked 
for a period of ten seconds, rested, and then 
worked for another period of ten seconds. 
When the child worked with the group of six- 
teen, he did on the average eight and one- 
half per cent better than when working alone. 
There were, however, important individual 
differences. When the group was divided into 
good and poor performers and their records 
analyzed, it was found that the inferior work- 
ers profited much more by the stimulation of 
the group than the better performers. Nine- 
ty-three per cent of the performance series of 
the poorer workers were more efficient under 
group conditions, while only fifty-one per cent 
of the performances of the better workers 
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were more efficient under the group condi- 
tions. When the working periods were length- 
ened, a similar tendency was found, although 
the difference between the two groups was not 
so pronounced. The factor of rhythm entered 
into the situation, especially during the longer 
periods; the better workers developed a group 
rhythm and found it difficult to escape. This 
fact has been confirmed by the experiments 
of Busse (20), reported in another section. 

A second series of experiments, utilizing 
the dynamometer, was conducted by Moede. 
The subjects simply grasped the instrument 
and squeezed it as hard as possible. ‘This 
they did alone or in pairs. The average score 
when alone was 203, and when in pairs, 224. 
There were, however, marked individual dif- 
ferences, as was observed in the first series of 
experiments. If the same person was paired 
successively with two others, one of whom was 
inferior to the other, it developed that he did 
better with the good rival. Pairing with an 
inferior rival decreased the score. The tend- 
ency in general was quite pronounced for the 
person working with less competition to have 
a lower score than when working with a good 
rival. This finding has also been confirmed 
by later experiments, notably those of Riddle 
(92). 

Moede also studied the effects of one group 
competing with another. The records show 
that when one is simply a member of a group 
competing with another group, he is less effi- 
cient than when working in a group of two. 
Large-group competition does not seem to be 
as effective as small-group competition or 
individual competition. 

Further studies were made of memorizing 
and attention span with the same trend in re- 
sults, i.e., the poorer workers improved more 
than did the better. In a cancellation test 
there seemed to be a tendency to speed up in 
a group, and to make somewhat more mistakes. 

Moede’s work was comparatively early but 
of considerable importance, suggesting vari- 
ous lines of study and in some measure antici- 
pating the results of many later studies. A 
number of other studies may now be treated 
more briefly. 

Bykowsky (21), studying groups of chil- 
dren in Poland, found that competition be- 
tween groups “will bring forth greater effort 
than individual work without competition.” 
Muller (85) measured the influence of com- 
petition on arithmetic and paper cutting tests. 


He compared this incentive with other incen- 
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tives, such as altruism and practical utility. 
Competition always resulted in increased 
work, especially among the younger children. 
This agrees with the results of Meumann’s 
study referred to previously. 

Hurlock (53) divided 155 children of the 
fourth and sixth grades into control and 
rivalry groups for the purpose of measuring 
the effects of group rivalry on work in addi- 
tion. The experiment was made daily for 
one week, and on every day the average score 
of the rivalry group exceeded that of the con- 
trol group. Rivalry proved to be a greater 
incentive for children of inferior ability than 
for those of average or superior ability. We 
may compare these results with those of Mul- 
ler and Meumann who found that youthful- 
ness, which is a kind of inferiority, tends to 
greater competitiveness. 

Elkine (34) studied the effects of the group 
on memory for numbers presented orally. 
Forty children, 19 boys and girls, were tested 
singly and as a group. It was found that in 
all but five cases retention was better when 
the numbers were read aloud to the group 
than to the isolated individual. Elkine ascribes 
this effect to emulation, suggestibility, conta- 
gion, and psycho-neural strain. The five 
children who did better work in isolation were 
shown by tests to be of inferior mentality. 
This experiment may be compared with the 
study of Travis (109) dealing with the effect 
of the group on the performance of stutterers, 
where an unfavorable influence also appeared. 
There seems to be some conflict here with the 
results of the majority of experimenters, but 
it may be that normal and pathological in- 
feriority differ entirely in their effect on com- 
petitiveness, so that there is no real conflict 
in experimental data. 

An extensive and important investigation of 
competition and cooperation was reported by 
Maller (68). His aim was to measure the 
effects of personal and social motivation on 
school children of various ages and social 
status. The basic questions involved were: 
(1) the relative effects of competition and co- 
operation on the efficiency of children’s work, 
and (2) the nature of children’s choices as to 
the type of effective motivation, i.e., personal 
or social. One thousand five hundred and 
thirty-eight children from the fifth to eighth 
grades, ranging in age from eight to seventeen 
years, were used as subjects. A simple meas- 
urement sensitive to initial individual differ- 
ences and individual variability seemed de- 
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sirable. This was found in the task of add- 
ing one place numbers. The digits 1 and o 
were not used, and combinations of like digits 
were avoided. Altogether, 56 possible com- 
binations were constructed, and these were 
presented as 4000 examples in ten booklets. 
The booklets were used as test material for 
five conditions, namely (1) a condition in 
which practice was the only incentive, (2) an- 
other in which individual prizes were given to 
the most rapid workers, (3) a third in which 
a prize was given to the most rapid group, 
(4) a condition in which the individuals were 
allowed to choose between individual and 
group motives, and (5) one in which the im- 
mediate and continuous effects of motivation 
were measured. The results of these experi- 
ments show that work for the self was more 
efficient than work for the group. The work 
curves for the self ascended while those for 
the group descended. Only in the initial 
stages was work under the two conditions 
approximately equal. Both motives increased 
speed above the level of unmotivated work or 
mere performance; however, the correlation 
between self and group work decreased with 
progress in the unit of work. A comparison 
of the first and second minutes of work with- 
in each of the test units displayed a general 
tendency for speed to drop. The speed of 
the first minute of group work was equal to 
the second minute of self work. When the 
opportunity was granted to continue work for 
the self or for the group, the self was chosen 
about three times as often as the group. Com- 
petitive desires were especially noticeable in 
grades five to eight. However, girls were 
more codperative than the boys, as indicated 
by their choices for group work. A maximum 
of codperation appeared in groups of consid- 
erable homogeneity, so that sectioning of 
classes on the basis of similar abilities might 
tend to promote codperation within the sec- 
tions. One might also surmise that only in 
complete democracies, where opportunities 
and possessions are somewhat equal, can we 
expect any large amount of codperation. 
Maller concludes that the competitive char- 
acter of our social life, illustrated on every 
hand, is undoubtedly responsible for the com- 
petitive character of the individual’s mind. 
Cooperative tendencies are mainly the product 
of education in general. 

A brief study of codperation and competi- 
tion was also published by Forlano (38). 
The experimental conditions included: (1; 











84 


practice, (2) work for the benefit of the indi- 
vidual’s class score, (3) work for teams into 
which the class was divided, and (4) work 
against the opposite sex. Thirty-four chil- 
dren of both sexes, with an average age of 
eleven years and nine months, participated in 
the experiments. The cancellation test was 
used to measure the relative effects of the dif- 
ferent experimental conditions. The results 
showed that the average child works for indi- 
vidual gain rather than class interest, and that 
sex competition is the strongest motive at 
this age. The conclusions agree with those 
of Maller and Sorokin. 

Greenberg (42), a student of Biihler, made 
a study for the purpose of ascertaining the 
causes, frequency, conditions, and character- 
istic phenomena of competition within a given 
situation. The subjects were children from 
two to five years of age. Pairs of children 
were asked to build, with and without in- 
structions, in different series of experiments, 
and were afterward asked to make judgments 
concerning the beauty or magnitude of the 
constructions. The results showed that the 
competitive spirit develops gradually and is 
not universal in children of any one group. 
Ability for the task, as well as individual 
temperament, is a factor in the competitive 
situation. Growth in competitive spirit fol- 
lows an orderly course, developing from a 
state characterized by the absence of com- 
petition, through several stages of conscious- 
ness, to a final stage of vigorous competition 
controlled by judgment. Greenberg’s refer- 
ence to competition, as partly dependent on 
the task and temperament, agrees with the 
more precise statements of Yamamoto (124) 
discussed in a later section. The fact that 
competitiveness is not universal in a child 
group of this age may be compared with 
Whitemore’s discovery that competitiveness is 
not universal in adults. Perhaps competitive- 
ness is not universal at any age or there may 
be an optimal age for human competitiveness. 
According to Greenberg, competitiveness ap- 
pears about the fourth year. No data are 
available as to a possible age at which com- 
petitiveness vanishes. 

Leuba (65) confirmed in a general way 
Greenberg’s findings concerning the ontogene- 
sis of the competitive attitude and the age at 
which it appears. Thirty-two children were 
given the task of placing pegs of uniform 
color in a peg-board. They worked singly 
and in pairs. Three stages of development 
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were observed. The two-year-old children 
displayed little reaction to the presence 0; 
another child working at the same task. 
Three and four-year-old children varied jp 
reaction, but some showed rivalry responses 
and reduced output. The five-year-old chil- 
dren exhibited definite rivalry responses and 
a markedly increased output in work. 

The latest experiment in this field, one by 
Bos (13), places the emphasis on what the 
author calls productive collaboration, but it 
appears to involve several types of competi- 
tion. Children working in pairs were found 
to achieve more work than when laboring in- 
dividually. This was ascribed to active col- 
laboration or codperative competition. The 
author distinguishes: (1) active collabora- 
tion, (2) resistance-determined collaboration, 
and (3) collective-individual working. It 
would seem that a kind of competition is in- 
volved in the last two of these forms, and in 
any event the minimal competition inherent 
in all situations of social stimulation and 
response. 

A special group of studies deals with the 
influence of the group on the intelligent quo- 
tient and mental test scores. Hurlock (50) 
studied the effects of praise and blame on the 
test scores of 257 white children and 151 
negro children from the third, fifth, and eighth 
grades. Both forms of the National Intelli- 
gence Test and the Otis Primary Examina- 
tion were used. After the first test, groups 
of equal ability were subjected to strong 
praise and censure, and told to take the test 
again. Both praise and blame were effective 
in raising the test scores, and certain sex and 
age differences appeared. Older children were 
most affected, and the boys responded more 
strongly than the girls. Younger children 
and superior children were most strongly 
stimulated by reproof. In a subsequent study 
Hurlock (51) reported that the intelligence 
quotients of praised and reproved groups 
were appreciably raised, while those of the 
control group improved but little on retest- 
ing. Another study of Hurlock’s (49) dem- 
onstrated improvement in arithmetic tests 
given to more than roo fourth and sixth grade 
children when praise or blame was used as a 
motivating influence on groups of equal abil- 
ity. In general Hurlock found that praise 
was more effective than blame. This is con- 
firmed by the findings of Gilchrist (39) and 
Briggs (15). Gilchrist gave the Courtis Eng- 
lish Test to a class of fifty college students 
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and then divided the class into equal groups 
and repeated the test. One of the groups 
was praised and the other was reproved for 
nerformance on the initial test. The praised 
group improved 79 per cent, while the re- 
proved group deteriorated in the second test. 
In this second group the better students of 
the first test deteriorated, while the poorer 
students improved. 


Hurlock also studied group rivalry as an 
incentive to increased efficiency in school 
work, and the relationship of such factors as 
age, sex, and individual peculiarities to the 
effects of group rivalry. The subjects in- 
cluded 155 children of both sexes, from nine 
to twelve years of age. The work consisted 
of modified Courtis Arithmetic Tests. Equiva- 
lent rivalry groups were stimulated by pub- 
licity and discussion, and their achievements 
were compared with those of a control group. 
The rivalry groups were found to outstrip the 
control group in speed and accuracy. Girls, 
younger children, and those of inferior ability 
responded most to the competitive motive. 
This is in line with Forlano’s study of a 
similar age range. 


In her excellent reviews of the literature on 
motivation Hurlock (52) notes the great dis- 
crepancy between work done by children and 
their 1.Q.’s, a discrepancy greatest in superior 
children. This is ascribed to the lack of a 
necessary incentive. Hurlock cites a number 
of studies to prove this point (75). This is 
intelligible in the light of the many experi- 
mental studies of competition which show 
that the more rapid workers are retarded by 
the tempo of the average group, while the 
slower are speeded up. However, the provi- 
sion of incentives will accomplish very little 
so long as the conditions of the basic proc- 
esses of group competition are not applied. 

Weston and English (118) studied the in- 
fluence of the group on psychological test 
scores, using two composite test forms con- 
taining items from Thurstone’s “Reasoning” 
and “Interpretation” tests and from Brig- 
ham’s “Opposite Test”. Ten upper-classmen 
served as subjects, working in two equal 
groups, in reverse order as to solitary or group 
application to the tests. Other groups of 
varying size were tested individually and in 
a group by another series of differential 
mental tests. The first experiments show a 
definite improvement of test scores under 
social stimulation. The results of the second 
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series of experiments show but slight 
improvement. 


It is clear that more experiments are nec- 
essary to describe the actual effects of group 
stimulation upon the processes involved in 
intelligence tests. It is especially desirable to 
have a large number of subjects, and to make 
the tests for the group and the individual in 
solitude as nearly alike in the kind and de- 
gree of intelligence involved as possible. One 
should guard against assuming too close a re- 
lationship between objective situations and 
the mental processes elicited by the situations. 
One situation may arouse a variety of differ- 
ent mental processes. Tests should not be 
confined to upper-classmen, or to groups of 
any single constitution. Nor can any really 
useful conclusions be formed except on the 
basis of a large number of experiments under 
a variety of conditions. 

It is significant that the results of Farns- 
worth (35) and Anderson (7) contradict 
those of Weston and English. In a very 
carefully conducted experiment Farnsworth 
repeated the work of Weston and English 
under more controlled conditions. He ad- 
ministered the tests once in isolation and once 
in the group, eliminating practice effects by 
using equivalent groups and by rotating the 
order of isolation and group conditions. No 
significant differences in mean group scores 
for the two conditions were found. There 
was, however, a consistent increase in varia- 
tion of performance under group conditions, 
and a slight tendency for the individual work- 
ing in isolation to miss fewer of the difficult 
items. In short, the group may affect indi- 
viduals adversely with difficult problems. 
Social stimulation may disrupt as well as 
facilitate, in which respect it is comparable 
to punishment (114, 113). 

Anderson was not concerned with the effects 
of the group on mental test scores, but rather 
with the effects of the group on the behavior 
of individuals of considerably different 1.Q.’s. 
Two groups of five senior high school boys 
took part in experiments involving addition, 
cancellation, and marble sorting. The I.Q.’s 
of one group were between 125 and 130, 
while those of the other group were between 
100 and 105. A general tendency for the 


brighter children to be inhibited or slowed 
down by the presence of others was apparent, 
although accuracy in this group was more 
favorably affected than in the group of ave: - 
age intelligence. Anderson suggests that what- 
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ever effect the sight or sound of others at 
work may have as a social stimulus depends 
on the habits of work and the habitual pat- 
terns of response to the presence of other 
people which mark the component members 
of the group. 


Maller and Zubin (69) studied the effects 
of competition or rivalry on intelligence test 
scores, utilizing alternate forms of the same 
test for initial and final testing. Forty-two 
children, divided into equivalent groups, were 
observed. Some of the more important con- 
clusions of the study follow: The repetition 
of an intelligence test under a very strong 
rivalry incentive caused no greater gain in 
score than repetition under normal conditions. 
The incentive brought about an increase in 
the number of items attempted, but there 
were more errors so that the total score was 
not affected. There were, however, impor- 
tant differences between the subtests in these 
respects. On the speed test (comparison of 
numbers) there was an increase in score, while 
in the power test (analogies) there was a 
decrease in score under the incentive condi- 
tions. The various intercorrelations and self 
correlations of performance increased under 


the influence of rivalry and the scores were 
more variable. 


The findings of Maller and Zubin agree in 
general with those of Farnsworth and Ander- 
son, and the majority of experiments seem 
to contradict the notion that competition or 
indeed any form of social stimulation im- 
proves mental test scores. Indeed, as tests 
increase in difficulty, requiring more and more 
deliberate and original reflection rather than 
speed, we might expect the reverse. Excep- 
tions may be expected now and then, espe- 
cially under conditions and in situations 
where competitive and cooperative attitudes 
have been given special attention. 


In addition to these experiments bearing 
directly on problems of educational impor- 
tance, other studies may be mentioned that 
are related rather indirectly to the problem 
of competition. These include the experi- 
ments and semi-experimental studies of Bar- 
ton (10), Breed and McCarthy (14), Charles 
(25), Jensen and Jensen (55), McCrory (75), 
and Turney (112). These studies contain 
data or suggestions of value to the student of 
social stimulation and response. 
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INDUSTRIAL 


The approach of industrial psychologists to 
the problem of competition may be illustrated 
by a quotation from Muscio (87). “Two 
companies who were engaged in ditch dig- 
ging decided on a competition. The men oj 
one company worked continuously until rest 
was imperative. Those in the other company 
were divided into three sets, each of which 
worked five minutes and then rested ten. ‘The 
result was that the latter company won an 
easy victory.” Competition of such a char- 
acter is often mentioned in the early litera- 
ture of industrial psychology. The interest, 
however, is in other factors than competition 
In this case the effects of fatigue and rest- 
pause were the objects of primary interest. 
Many of the early studies, using the ergo- 
graphic technique, involved competitive con- 
ditions, but the primary interest was in the 
character of fatigue. 


Féré (36) was one of the first investigator: 
to recognize the psychological effects of com- 
petition and to attempt to measure the effects. 
In 1904 he reported that efficiency was in- 
creased when a subject observed another lift- 
ing his finger although not actually raising a 
weight. Similar observations were made by 
Féré (37) in a much earlier work published 
in 1887. He seems to have been aware of 
the facilitating effects of competition on sim- 
ple functions, and often speaks of his subjects 
competing. 

Applied psychologists soon recognized the 
dynamogenic effects of competition, and as 
early as 1923 Scott (99) discussed in detail 
the effects of competition in industrial situa- 
tions. The general conclusion was drawn 
that competitive urges could be used advan- 
tageously both for the employer and the em- 
ployee. By 1929 Burtt (19) referred to sev- 
eral experimental studies of special value in 
industry. 

Investigations especially relevant to indus- 
trial psychology include those by Moede 
(82, 84), Crawley (27), Whiting and Eng- 
lish (119), Kohler (59), Busse (20), and 
Sorokin (106). Moede’s study has already 
been reported. 

Crawley studied the effects of competitive 
motivation on fatigue of the arm and leg 
muscles. A modified form of the chest weight 
apparatus, found in all gymnasiums, was used 
for measuring fatigue. Four subjects par- 
ticipated in the experiments. The results 
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justified the conclusion that “more work was 
accomplished when the subjects competed 
with a former record and with results visible 
than when the same task was performed with- 
out competition and with results screened.” 
[his accords with the results of studies con- 
cerned with the effect of knowledge of results 
on efficiency of performance. The studies by 
judd (57) and Arps (8) may be cited. Com- 
petition with previous records was probably 
involved in all cases. Crawley also found 
that the extra exertion by the subject in the 
competition series showed itself in diminished 
output in the second work period, even when 
this period followed a four minute rest. 
Competition consumes energy and demands 
“greater time for recovery,’ a conclusion of 
considerable importance for industry. 

Whiting and English (119) attempted to 
measure fatigue of the students’ normal day’s 
work by means of physical and mental tests 
of speed, accuracy, and facility. No appre- 
ciable effect of fatigue was shown. Appar- 
ently the “test attitude,” which is essentially 
competitive, was present as a constant factor. 
It appears that the competitive spirit delays 
the onset of fatigue. 

Kohler (59) studied the influence of fel- 
low-workers where cooperation is emphasized 
and competition eliminated as far as possible. 
A weight was lifted in time with a metronome 
by pulling on a bar. Sometimes one subject 
lifted the weight alone with a load of 41 kilo- 
grams; sometimes two subjects lifted the 
weight with a load of 82 kilograms; and some- 
times three subjects lifted the weight with a 
load of 123 kilograms. The metronome 
sounded at intervals of two seconds. The 
score was stated in terms of meter kilograms 
of work done in a specified time. Series were 
run with teams of varied makeup, i.e., pairs 
of equal and unequal strength. When two 
members were equal their performance with 
the heavier weight was only about seventy 
per cent of the average of their individual 
scores. Each tried to lead and the conflict 
of tempo resulted in decreased efficiency. 
Instructions to codéperate had little effect in 
eliminating competition. Efficiency increased 
for a time as one worker became superior, 
and an optimal condition was found when one 
was sixty-five to seventy per cent as effective 
as the other. The superior worker set the 
pace and the other followed. The competi- 
tive urge of the weaker was inhibited by the 
stronger. 
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Sorokin and his associates conducted a 
sociological experiment to compare the effects 
of individual and collective attitudes and re- 
wards on children of various ages. Pre-school 
children, kindergarten. pupils, and high school 
boys, affording an ascending series of ages 
and degrees of determination by the social 
environment, were used as subjects. ‘The pre- 
school children transferred marbles from place 
to place; filled, carried, and emptied cups of 
sand; and selected pegs of certain color and 
size from a box of miscellaneous pegs. The 
high school boys carried pails of water from 
place to place and worked at arithmetical 
problems. Remuneration consisted of toys 
and pennies for the children, and money for 
the boys. The reward was individual if at 
the entire disposal of the child, collective if 
at the disposal of the child’s group and not 
to be taken home. It was found that work 
was more efficient: (1) under “individual” 
than ‘“‘collective’’ remuneration, (2) when the 
worker worked for himself than when he 
worked for others, (3) under unequal re- 
muneration, and (4) under overt competition 
than when it was lacking. ‘Pure competi- 
tion” without any pecuniary remuneration 
was found to be more stimulating than 
“equal” remuneration. The children, how- 
ever, showed considerable individual differ- 
ences in their reactions to these factors. The 
results agree with those of an earlier study by 
Forlano (38) who found that the average 
child works “for individual gain rather than 
class interest.” 

A final study which bears on the subject 
of competition in industrial work is that of 
Busse (20) who claims that every person has 
a rhythm of his own which remains constant 
without being influenced by the factor of 
fatigue. There is a strong tendency to 
mechanization, and in 89 per cent of the cases 
studied rhythm was apparent in group activ- 
ity. Increase of speed was shown when work- 
ing in companionship with fast workers and 
with those of one’s own tempo. Under such 
conditions a new tempo is struck and this 
tempo again becomes constant. In short, a 
competitive situation alters the personal 
rhythm of work. 


PSYCHOLOGICAL STUDIES 


Studies primarily interested in the phe- 
nomena associated with competition include 
those by Triplett (111), Floyd Allport (4), 
Riddle (92), Whittemore (120, 121), Sen- 
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gupta and Sinha (100), Travis (110), Simms 
(104), Dashiell (30), Ichheiser (54), Makita 
(67), Yamamoto (124), Vaughn (115), 
Vaughn and Geldreich (116), and Lepley 
(62). These studies do not differ in funda- 
mental respects from those in the preceding 
sections. They are, however, mainly inter- 
ested in the psychological phenomena asso- 
ciated with competition rather than the 
applied phases of the subject. Adult sub- 
jects have also been used in the experiments, 
in contrast to the usually juvenile subjects 
in the preceding section. This, of course, is 
probably mainly a matter of convenience. 

One of the earliest experimental studies of 
competition was that of Norman Triplett, 
who seems to be the first psychologist aware 
of the importance of competition as a psycho- 
logical process subject to experimental con- 
trol. We might indeed speak of his discov- 
ery of the dynamogenic effects of competition 
as the “Triplett phenomenon”, since its ob- 
servation really started the psychologist’s work 
in this field. The work of G. Stanley Hall, 
William H. Burnham, and E. C. Sanford, 
however, must be taken into consideration, 
since they undoubtedly exerted important 
influences over their students at Clark 
University. 

Triplett studied the dynamogenic factors 
in pace-making as exhibited by two sets of 
data, one obtained from the records of the 
Racing Board of American Wheelmen, and 
the other obtained through his own experi- 
ments. The records of the Board were used 
to study the effects of three conditions of 
competition on the speed of bicycling. Under 
one condition the contestants raced against 
time only, attempting to lower an established 
record. Under another, the objective was 
the same, but a pacemaker was used to stimu- 
late the contestant. The third condition was 
the real race, including pacemaker and several 
contestants, the aim being to beat other con- 
testants and keep up with the pacemaker. 
The records presented show the dynamogenic 
values of the different conditions to be as fol- 
lows: first, paced competition; second, the 
condition in which a pacemaker is present; 
and, third, racing against time alone. Accord- 
ing to Triplett “the bodily presence of an- 
other contestant participating simultaneously 
in the race serves to liberate latent energy not 
ordinarily available... . The sight of the 
movements of the pacemakers or leading com- 
petitors, and the idea of higher speed fur- 
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nished by this or other means, are probably 
in themselves dynamogenic factors of some 
consequence. .. .” It should be observed, 
however, as Triplett suggests, that not all are 
affected in the same way by the pacemaker: 
an important deviation from the rule is that 
some are adversely affected by the pacemaker. 
‘As a rule the rider who is fast with a pace 
is slow without it, and the converse is 
believed to be true.” 

In experiments conducted at Indiana Uni- 
versity, Triplett obtained competition records 
from 225 persons of all ages and both sexes. 
The records represent speed in turning a fish- 
ing reel. They were obtained under two con- 
ditions, one a control and the other competi- 
tion. Subjects competed directly with one 
another. Forty records are presented. They 
show that the subjects were of three classes: 
(1) some were stimulated to make faster time 
during competition; (2) others suffered in- 
hibition of motion during competition; and 
(3) a few remained relatively unaffected. 
Triplett also observed that some subjects who 
suffered inhibition of motion during the ini- 
tial trials recovered and displayed facilitation. 
Others, however, dsplayed inhibition through- 
out the series of experiments. A few sub- 
jects displayed initial facilitation followed by 
inhibition. Triplett ascribed the inhibition 
phenomena to the mental attitude of the sub- 
jects, viz., an intense desire leading to over- 
stimulation. Some age and sex differences in 
the variability or steadiness of performance 
were noted. A breakdown or disruption of 
behavior was observed to occur more often 
among men than women and among the 
young than the old. 

Triplett’s experiments have been described 
at length because they were the first experi- 
ments in the field and set many of the prob- 
lems for later investigations. The same can 
be said for another series of experiments, 
those of Allport on the influence of the group 
on association and thought. The significance 
of this work lies in the author’s attempt to 
exclude all incentives to rivalry not inherent 
in the situation. In this way he revealed the 
intrinsic rivalry or competition common to all 
social situations. The first four experiments 
dealt with “free chain” association. The fifth 
experiment was with controlled association, 
and the sixth with thought processes, e.g., the 
task of offering arguments against selected 
passages of philosophy. Several conclusions 
were drawn by Allport. (1) The presence of 
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4 co-working group is distinctly favorable to 
the speed of the process of free association. 
From 66 to 93 per cent of the subjects dis- 
slaved this beneficial influence of the group. 
(2 ) The benefits of group influence are sub- 
‘ect to variation according to the nature of 
the task. In the more mechanical and motor 
tasks, such as writing associated words, the 
croup stimulus is more effective than in the 
more highly mental or more purely associa- 
tional tasks such as writing every third or 
fourth word. (3) There are individual dif- 
ferences in susceptibility to the influence of 
the group upon association. One type who 
are nervous and excitable may succumb to the 
distracting elements of group activity and may 
show either no effect or a social decrement. 
(4) In its temporal distribution the beneficial 
effect of the group is greatest in the first part 
of the task and least toward the end of the 
task. (5) There is a tendency for the slow 
individuals to be more favorably affected in 
speed than the more rapid workers by group 
co-activity. There are, however, exceptions. 
(6) The variability in output among indi- 
viduals varies generally with the social influ- 
ence. Hence it is usually greatest in the 
group work. A striking exception to this 
occurs in the test where rivalry is correlated 
with the social increment, and where every 
third or fourth word is written. Here the 
variability is greatest in the solitary work. 
This conclusion is in agreement with those of 
earlier investigators working on different 
processes. (7) There is suggestive but not 
conclusive evidence that the output of associa- 
tions in a group where all the members are 
performing associations in the same category 
is greater than in groups in which the mem- 
bers are divided in the trend of their asso- 
ciations between opposite or controlled cate- 
gories. Allport further states that the factors 
in social influence are: (1) facilitating fac- 
tors, (a) motor facilitation through group 
behavior, and (b) rivalry intrinsic to the 
group; (2) impeding factors such as dis- 
traction, emotion, and over-rivalry. It seems 
clear from this careful study that competition 
in some form is inevitable in every social 
situation. 

Riddle (92), in her interesting study of 
aggressive behavior in small groups, as ob- 
served in poker games conducted in the lab- 
oratory, throws much light on the nature of 
competition. Six college students who were 
members of the same fraternity recorded in- 
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trospections pertaining to the motives of all 
important moves. The data are too compli- 
cated to summarize briefly, but, the genesis, 
fluctuations, and objects of rivalry in such 
group activities are well illustrated. 

Whittemore (120, 121) studied the effects 
of rivalry by objective and introspective meth- 
ods. The objective records showed: (1) that 
all individuals did more work when compet- 
ing than when not competing, and (2) that 
students least capable profited most from com- 
petition. The introspective analysis of con- 
sciousness by thirteen Harvard men and four 
Radclifie women leads to the following con- 
clusions concerning the nature of “competi- 
tive consciousness.” (1) Competition with 
the group at large is less frequent than com- 
petition with a particular individual. That 
member of the group whose skill most nearly 
approaches the skill of a given subject is the 
one who tends to be singled out as his prin- 
cipal rival. (2) Auto-competition plays a 
large part in the competitive efforts of all sub- 
jects; it plays the principal part with some 
of them. (3) The consciousness of competi- 
tive effort is ex post facto, but occasionally it 
breaks through into consciousness as an im- 
mediate experience of the efforts of others, 
or as a recognized comparison, at the moment, 
of the subjects previous performances. (4) 
In most cases the competitive spirit rises dur- 
ing the period of adjustment, but it dies out 
in the long run. (5) Elements of the com- 
petitive attitude sometimes carry over into 
non-competitive periods of work, but there 
are no objective indications of a rise in pro- 
ductivity corresponding to this persistence. 
(6) The proportion of irrelevant ideas is 
higher in competition than in non-competition. 
(7) The game element in competition elim- 
inates some of the boredom of a simple task 
often repeated. (8) About as many subjects 
indicate a preference for non-competitive work 
as for competitive work. There is no relation 
between a preference and success. (9) There 
is some evidence that most subjects undergo 
physiological changes leading to a rise in 
blood pressure, probably attributable to an 
emotional element of excitement, during 
periods of competition. 

Whittemore’s objective records place a lit- 
tle more emphasis on the facilitating effects 
of competition than have other studies, namely 
those by Triplett. This, of course, could very 





easily be the result of the personalities of the 
subjects who served in the experiments. 
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Sengupta and Sinha (100) repeated some 
of Allport’s work under more controlled con- 
ditions. Five subjects worked at cancelling 
A’s and E’s from a newspaper clipping, in 
isolation and as members of a group. Facili- 
tation appeared when working in a group. 
The authors ascribed this to an attentional 
rather than an emotional factor. The sub- 
jects were more able to concentrate their at- 
tention when working as members of a group 
than when working in isolation. 

Travis (109) repeated some of Allport’s 
experiments on a group of stutterers, and 
found that 80 per cent produced more free 
associations when working alone than when 
working in a group. This agrees with All- 
port’s conclusion that nervous and excitable 
subjects very often display social decrement. 
It is very probable that competitive conscious- 
ness in some types of abnormality may be of 
a painful character and therefore disruptive 
on behavior. 

Simms (104) compared the results of com- 
petition against one’s own record and against 
the records of others of equal ability with 
competition as a participant of a group 
against another group. In one experiment 
126 college students worked at substituting 
digits for letters. In a second experiment 
76 college students whose rate and quality 
of reading were known worked on textbook 
learning. Improvement was observed to oc- 
cur more rapidly under the condition which 
allowed the individual to compete with others 
of equal ability. 

Dashiell (30) compared test scores of speed 
and accuracy in multiplication, mixed rela- 
tions, and free serial word associations in a 
large number of social situations. He de- 
fined both competitive and non-competitive 
social situations, and control situations in 
which the subjects worked alone. Analyses 
of Dashiell’s tables show great variations in 
facilitating effects. He notes that the as- 
sumption that speed is facilitated merely by 
the presence of co-workers is a mere ideo- 
motor phenomenon and not indicated by the 
data. The competitive attitude is much the 
more important factor. Even when subjects 
work alone, their knowledge that others are 
working elsewhere may arouse a competitive 
spirit of some degree. 

This is also suggested in an experiment by 
Ichheiser (54) who found that accuracy and 
speed of performance improved in the pres- 
ence of a spectator, probably indicating that 
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a desire to make a good showing, in short to 
compete with one’s best past performance. 
or with what one thinks is expected by the 
observer, is aroused in such a situation. We 
might suggest that Laird’s (60) results in his 
study of the influence of razzing, and some 
of the phenomena noticed by Riddle in cop- 
nection with bluffing at poker games, are 
partly explicable by the same reason, viz. 
bluffing at games of skill and responses to 
razzing both involve a response to strong oppo- 
sition, the strongest rival player, the most 
annoying razzer, etc. 

Thelin and Scott (108) studied bluffing of 
a very different type, viz., in answering ques- 
tions on English Literature in which ques- 
tions on imaginary authors and _ fictitious 
works and events were mixed with legitimate 
questions. Here, too, the data indicated 
greater aggressiveness and willingness to com- 
pete on the part of students as compared 
with those who were not students. The rea- 
son for this is suggested in Yamamoto’s (124) 
recent work. 

Indeed, in all situations where active or 
passive or even ideal spectators or an audi- 
ence of any kind is introduced, the change in 
the performance of the subject is no doubt 
partially explicable in terms of an emerging 
attitude of competition. Whether this in- 
creases efficiency is uncertain. Different views 
have been presented. In all probability the 
effect is an individual affair and also depends 
on the character of the social situation and 
the work. 

Makita (67) discusses the modifications of 
what he calls “the level of demand” which 
occur under competitive conditions. Compe- 
tition may be carried into effect through in- 
structions, stimulation, or through fact. Fac- 
tual competition contains both playful and 
genuine serious elements. The fields of each 
differ in dynamical aspects which the author 
tries to demonstrate. 

Another Japanese student, Yamamoto 
(124), carried out a series of five experiments 
to study the manner and frequency of com- 
petition and the views of competitors con- 
cerning the value of their work. The com- 
petitive activities studied were picking up 
beans, shooting, guessing the number of balls 
in a cup, mirror drawing, and solving puzzles. 
The author found that individuals vary with 
changes in the nature of the work from com- 
petitiveness to indifference and codperation. 
The competitive attitude according to Yama- 
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moto is more closely related to the character 
of the work than to that of the worker, a 
conclusion which raises the question concern- 
ing the conditions that give character to work. 
The results of the experiments can be ex- 
pressed as three kinds of transposition of the 
level of demand, viz., (1) where two workers 
are in the same level, (2) where they work 
at a moderately different level, and (3) where 
there are widely different levels of per- 
formance. 

Vaughn (115) studied the effects of three 
conditions of competition on the marksman- 
ship of persons of widely different abilities. 
One condition gave an advantage to high ini- 
tial ability by awarding a gold medal to the 
person making the highest score. Another 
condition gave an advantage to low initial 
ability by awarding a similar gold medal to 
the person showing the greatest improvement 
during the series of experiments. The third 
condition was one in which all subjects were 
given handicaps according to ability demon- 
strated during a preliminary period of shoot- 
ing involving 300 shots. A gold medal simi- 
lar to the other two was awarded to the win- 
ner of this condition. ‘The most general con- 
clusion issuing from the results is one to the 
effect that a person’s opinion concerning the 
possibility of success is an important factor 
in determining the direction of his efforts. 
The better marksmen chose the condition of 
open competition because they knew they 
were good marksmen and had a chance to 
win the particular match. They had little to 
do with the improvement condition because 
they had improved about as much as possible 
over a period of years of practice. This, how- 
ever, did not eliminate the competitive spirit 
during the improvement condition; they 
worked hard, not to win the particular match, 
but to maintain their averages so that their 
scores in open competition would remain high. 
It is rather significant to note, however, that 
the quality of their work was not as good 
under such indirect motivation as was the 
case when they were trying to win by work- 
ing directly under the condition of open com- 
petition. This type of attitude seems very 
often to be the case when people are working 
for one kind of achievement in order to gain 
a related goal. The poorer marksmen chose 
either the condition where improvement was 
the criterion of success or the condition where 
the better marksmen were handicapped. Some 
of the boys who were rather lazy generally, 
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but possessed of good ability, judged that 
they had an unusually good chance to win 
both conditions, especially the improvement 
condition, by working hard. 

A study by Vaughn and Geldreich (116) 
attempted to correlate certain implicit re- 
sponses and personality traits with competi- 
tive attitudes. The results of three series of 
experiments were analyzed. In one case vari- 
ability in behavior, expressed in terms of the 
standard deviation, was related to the com- 
petitive attitude. Behavior was observed to 
be much more variable or erratic under those 
conditions of competition in which the indi- 
vidual desired to win but lacked confidence. 
Confidence or lack of confidence was, there- 
fore, regarded as one of the associates and 
probable causes of variability. A second se- 
ries of experiments was arranged for the pur- 
pose of studying peculiarities in the pyscho- 
galvanic response during competition. The 
results associated variability in the psycho- 
galvanic response with the competitive atti- 
tude. The conclusion was suggested that the 
magnitude of the psychogalvanic response is 
inversely proportional to confidence; the mag- 
nitude is small when confidence is great and 
large when confidence is small. Confidence 
was observed to oscillate back and forth dur- 
ing a given period of competition. Stated in 
other terms, internal bodily reactions were 
considered to be substitutes for overt behavior 
and to occur in intensified form through the 
inhibition of conation and overt behavior. 
This interpretation has been suggested by 
Aveling (9) and Cattell (22, 23, 24). The 
third series of experiments represented an 
attempt to explain variability in behavior in 
terms of certain personality traits. The re- 
sults of these experiments were more sugges- 
tive than final, but they tended to associate 
such traits as emotional instability, self-insuffi- 
ciency, introversion, dominance, lack of con- 
fidence, and asociability with variability in 
behavior. No one pattern, however, char- 
acterized all individuals. Some apparently 
desired to dominate over others but lacked 
the necessary confidence; others were evi- 
dently emotionally unstable and owed their 
variability to such a trait. Highly submis- 


sive or dominating and confident individuals 
were rather uniform or constant in their 
behavior. 

One of the latest experimental studies rep- 
resents an attempt to discover competition 
It is the first experimental study 


among rats. 





4 
A 
i. 


g2 JOURNAL OF EXPERIMENTAL EDUCATION 


of competition among animals that has come 
to our attention. The biologists have talked 
for nearly a century of competition among 
animals, but studies have been largely of a 
naturalistic order. Lepley (62) accustomed 
pairs of rats to run in a thirty foot alley with 
an effective goal. The rats were matched on 
the basis of preliminary runs. The pairs were 
finally placed in a real competitive situation 
where a reward for the victor was presented. 
No clear evidence of competition was ob- 
tained. Experiments on animal competition, 
however, should continue, and utilize a variety 
of animal subjects. Certainly race horses 
give every evidence of individual competitive- 
ness in the animal group situation of a race. 


MISCELLANEOUS STUDIES 


There are some rather interesting sociologi- 
cal studies of competition, and a few sum- 
maries which should prove to be useful to 
any student in the field. 


Mead (78) has edited a work which con- 
tains thirteen studies of primitive peoples 
which are classified according to whether 
competition, codperation, or individualism is 
emphasized in the ethnic culture examined. 
Certain conclusions are drawn. (1) Strong 
ego development can occur in all three forms 
of society and is not, therefore, dependent on 
competition. (2) Subsistence level is not di- 
rectly relevant to the question of how codp- 
erative or competitive a society will be. The 
social conception of success is a more impor- 
tant determinant. There are, however, cor- 
respondences between emphasis on competi- 
tion, a social structure depending on indi- 
vidual initiative, valuation of property for 
individual ends, a single scale of success, and 
egoism on the one hand, and emphasis on 
codperation, communal initiative, a valuation 
of individual security rather than property, 
and rising status on the other hand. The 
work tends to support ideas advanced by 
Mead (79), in an earlier work based on field 
study in which she compared the cultures of 
three neighboring New Guinea tribes. One 
was dominated by coéperation, another by 


warlike competition, and a third by aesthetic 
motives. 


The exhaustive report on competition and 
codperation by May, Doob, and others (73) 
offers useful preliminary reports on the same 
topic. 
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SUMMARY 


It is apparent that the competitive attitude 
is largely a product of social development, 
but the fact that its appearance is almost 
universal indicates very clearly that the capac- 
ity for its development is present in all human 
beings. The assertion commonly encountered 
in studies of motivation and affective organi- 
zation to the effect that every one inherits 
competitive or dominating urges is open to 
question. The particular form, intensity, and 
objects of competition are largely dependent 
on the nature of the social environment. 
They vary considerably among _ individuals 
and groups, and seem to be dependent on 
the degree of socialization which the indi- 
vidual and the group have achieved. 


The general conclusions pertaining to the 
effects of competitive situations on human 
behavior which issue from experimental 
studies of competition are evident. (1) Com- 
petitive conditions of one sort or another 
generally increase the efficiency of work and 
facilitate learning. (2) A particular form or 
type of competitive situation, however, does 
not affect all individuals in the same way. 
In some cases the competitive spirit is appar- 
ently aroused and the efficacy of work or 
learning is increased. (3) Generally, indi- 
viduals display the competitive spirit and in- 
tensify their efforts to excel under those con- 
ditions and in those situations which promise 
success. Conditions which promise failure 
either disrupt the individual or result in a 
change in the direction of competitive efforts 
so that the opportunity for success is present. 
(4) There is a very strong suggestion to the 
effect that competition of an indirect or imagi- 
nary sort is less effective in facilitating work 
and learning than competition of a direct sort 
where the rivals or goals are plainly visible. 
(5) The results of one series of experiments 
suggest very strongly that the anticipation of 
failure in overt behavior results in the de- 
flection of the energy of conation to internal 
bodily organs and mechanisms, and that these 
become more active than under non-competi- 
tive conditions or conditions that promise 
success. (This conclusion is of considerable 
practical and theoretical importance and 
should be checked very carefully. It sug- 
gests a possible explanation for the genesis 
of many introspective or introverted traits.) 
(6) There is some evidence to the effect that 
the complex mental processes are more easily 
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disrupted by intense competitive attitudes 
than are the simpler motor and mental proc- 
esses. This conclusion needs further experi- 
mental verification. 

From a practical point of view the results 
ef the various experimental and clinical 
studies of competition are very important. 
The experimental studies have shown very 
clearly that individuals react differently to 
competitive situations. Even in simple tasks 
some are facilitated, while others are inhib- 
ited and disrupted, and there is a very strong 
suggestion to the effect that the complex 
mental processes are involved more in the 
disruptive effects than in those of a facilitat- 
ing character. It is perfectly possible that 
those school subjects and situations in life 
which necessarily involve the complex mental 
processes are suffering from the disruptive 
effects of competition. It is also perfectly 
clear that we can hardly hope to make much 
headway in the cultivation of those desirable 
mental processes which so many prominent 
administrators are talking about today, with- 
out developing the proper kind of adjust- 
mental attitude on the part of one individual 
toward another. Some have contended that 
this attitude should be of the nature of failure 
and submission; others have held with equal 
cogency that the attitude should be one of 
codperation and altruism. In any case, with- 
drawal and other forms of pathological escape 
from various practical situations seem 
unnecessary. 
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